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ABSTRACT 


A complete,  real-time,  channel  vocoder  delivering  good  speech  quality 
with  a 2400-bit/second  data  transmission  rate  was  implemented  using  purely 
digital  circuitry  in  the  form  of  a high-speed  programmed  microprocessor. 

Necessary  algorithms  are  presented  and  their  effect  on  the  machine  design 
is  discussed  in  detail.  The  end  product  is  a very  high-speed  computing  ma- 
chine (measured  in  program  throughput  terms) . It  turned  out  to  have  a high 
degree  of  programming  flexibility,  which  would  make  it  adaptable  to  other 
tasks.  This  was  a bonus,  not  an  original  goal.  The  project  was  conceived 
and  successfully  realized  as  the  most  practical  way  to  build  a vocoder  for 
actual  use  in  the  LES-8/9  satellite  communications  system. 
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I. 


INTRODUCTION 


A digital  vocoder  is  a device  which  extracts  from  samples  of  speech  those 
attributes  which  are  most  essential  for  accurate  synthetic  speech  reproduction, 
subject  to  the  constraint  that  the  data  link  between  the  transmitter  and  re- 
ceiver utilizes  a minimum  data  rate.  The  microprocessor  described  in  this  re- 
port was  designed  specifically  to  implement  a 2400-bit/second  channel  vocoder 
delivering  good  speech  quality  in  real  time.  The  judge  of  what  does  and  does 
not  constitute  acceptable  quality  is  the  human  ear.  The  criteria  are  thus  highly 
subjective  and  ill  suited  to  precise  mathematical  description.  As  a result, 
vocoder  algorithms  are  largely  empirical  in  nature.  The  good  ones  have  taken 
many  years  to  develop.  Not  unexpectedly,  however,  one  fundamental  fact  has 
emerged:  — computational  complexity  and  final  speech  quality  for  a given  data 
rate  are  directly  related.  The  Gold-Rader-Tierney  channel  vocoder  algorithm 
used  in  this  project  was  developed  over  a period  of  several  years  by  a number 
of  people  with  the  goal  of  excellent  speech  quality.  It  is  thus  rather  complex 
and  takes  a great  deal  of  computation  to  implement.  Several  successful  imple- 
mentations were  built  using  analog  circuitry  for  the  spectrum  analysis  and  syn- 
thesis, and  digital  pitch  detectors.  However,  the  algorithm  has  never  before 
been  implemented  in  an  all-digital  machine  in  real  time,  because  of  the  high  re- 
quired machine  speed.  The  processor  described  in  this  report  is  the  first  ma- 
chine with  sufficiently  high-speed  capabilities  to  realize  the  Gold,  et  al.  chan- 
nel vocoder  algorithm  in  full  duplex  real-time  form.  This  was  achieved  using 
commercially  available  Schottky-clamped  TTL  logic  and  an  8-MHz  system  clock 
rate.  To  give  a feeling  for  the  speeds  involved,  an  IBM  370  has  a program 
throughput  rate  roughly  two  orders  of  magnitude  too  slow. 
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Many  vocoder  algorithms  were  developed  under  the  guidance  of  B,  Gold  over 
the  last  decade  or  more.  A great  deal  of  algorithm  simulation  on  large  machines 
has  been  done.  The  most  recent  work  of  interest  for  this  project  was  done  by  P. 
Demko  and  J.  Tierney,  both  of  Group  27  (unpublished  internal  report).  They  made 
use  of  a general  purpose  computer  to  simulate  a 16-channel  vocoder  in  non-real 
time.  Finite  length  words  were  used  in  order  to  determine  the  minimum  required 
computer  accuracy  for  acceptable  reproduction  of  vocoded  speech.  A given  word 
length  once  decided  upon  was  used  throughout  the  computation,  all  of  which  was 
done  in  fixed  point  arithmetic.  Under  constraints  of  this  nature,  computational 
optimization  is  equivalent  to  finding  optimal  scaling  of  the  processed  numbers 
such  that  dynamic  range  is  maximized.  The  results  of  their  work  are  therefore 
of  great  value  to  anyone  interested  in  building  a fixed  point  microprocessor  to 
implement  a channel  vocoder.  Specifically,  their  results  indicated  that  a 12-bit 
coefficient  and  sample  word  together  with  16-bit  processing  was  a good  choice. 
18-bit  processing  gave  only  marginal  improvement.  The  machine  described  in 
this  report  follows  the  Tierney-Demko  simulation  fairly  closely.  One  major  ex- 
ception is  the  use  of  16-bit  operations  throughout,  including  coefficient  lengths. 
Other  exceptions  are  of  a more  detailed  and  minor  nature  and  will  be  pointed  out 
as  we  come  to  the  appropriate  point. 

The  main  thrust  of  this  report  is  the  description  of  the  actual  computing 
machine;  however,  since  the  reason  for  its  existence  is  a realization  of  a chan- 
nel vocoder  a description  of  the  salient  features  of  the  algorithm  implemented 
will  be  given. 

The  method  by  which  the  machine  was  designed  will  also  be  discussed.  This 
should  be  helpful  to  others  faced  with  similar  design  problems.  At  the 
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beginning  of  the  project  I knew  of  no  machine  design  capable  of  the  high  pro- 
gram throughput  speed  needed  for  the  channel  vocoder  algorithm.  Existing 
microprocessor  designs  and  standard  digital  design  procedures  were,  with  some 
exceptions,  not  too  helpful.  A new  approach  to  the  problem  had  to  be  worked 
out.  The  eventual  success  of  the  project  was,  I believe,  almost  entirely  due 
to  the  design  approach  developed;  that  is,  simultaneous  and  closely  coupled 
design  of  both  hardware  and  software.  To  start  with,  we  knew  the  algorithm 
to  be  realized,  and  could  therefore  write  down  all  the  mathematical  expressions, 
and  thus  the  required  computational  forms.  Using  available  integrated  circuits 
the  problem  then  reduced  to  fitting  them  together  in  such  a way  that  the  compu- 
tational forms  could  be  executed  in  the  most  efficient  way.  Each  required  oper- 
ation was  carefully  scrutinized  and  implemented  in  software  or  in  hard-wired 
logic? whichever  of  the  two  achieved  greater  efficiency.  As  an  example,  con- 
sider a sequence  of  identical  operations  performed  on  varying  data.  In  software, 
it  is  usual  in  such  cases  to  make  use  of  a so-called  DO  LOOP.  An  index  is  set, 
a test  performed  and  a decision  made  whether  to  return  to  the  beginning  of  the 
sequence  or  to  continue.  Symbolically,  such  a procedure  may  be  expressed  as 
follows : 
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1.  Set  I = 0,  K = M 
-►  2.  Take  in  New  Data 

3. 

I Perforin  Required] 

Operations  j 
n 

- n + 1.  Set  I = I + 1 

n + 2.  If  (K  - I)  > 0 go  to  Step  2,  continue  otherwise 

n + 3 . 

n + 4 . 

n + 5.  

In  this  sequence,  steps  (n  + 1)  and  (n  + 2)  do  not  contribute  to  the  computa- 
tion, but  only  to  the  control.  They  could  thus  be  considered  overhead,  costing 
both  program  memory  and  execution  time.  If,  however,  we  eliminate  the  DO  LOOP 
control  instructions  many  more  program  steps  would  have  to  be  written.  An  opti- 
mum solution  therefore  is  to  build  in  a DO  LOOP  mechanism  in  hardware.  For 
example,  one  which  will  give  rise  to  the  following  software: 

1.  Do  2,  n,  M 
p^2.  Take  in  New  Data 

3. 

(Perform  Required] 

Operations  ] 

n 

n + 1 
n + 2 
n + 3 

The  above  means  DO  step  2 through  n,  M times  and  on  completion  continue 
with  the  program.  It  should  be  noted  that  steps  n + 1 and  n + 2 are  now  no 
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longer  in  the  loop.  All  needed  controls  are  line  1.  If  each  program  line 
takes  T sec  to  execute,  the  above  solution,  besides  saving  two  lines  of  program, 
also  shortens  the  executive  time  by  2Ml  sec.  For  large  M this  can  be  quite 
considerable . 

The  software  operations,  instruction  set,  and  machine  architecture  are  de- 
signed together  as  described,  modifying  all  three  as  necessary  while  working 
through  the  required  mathematical  expressions.  It  will,  of  course,  be  appre- 
ciated that  after  several  steps  in  the  process,  one  will  have  to  go  back  to  the 
beginning  and  re-assess  the  impact  of  the  latest  changes  on  the  previous  compu- 
tational procedures.  Thus,  this  is  an  iterative  design  procedure  with  feedback, 
which  stops  when  all  required  computations  are  implementable  and  the  job  can  be 
done  within  the  required  time.  In  terms  of  engineering  esthetics,  it  is  a very 
satisfying  process  in  that  it  meets  all  requirements,  and  allocates  tasks  ef- 
ficiently between  hardware  and  software. 

After  following  the  above  design  philosophy  for  some  time  it  suddenly 
dawned  on  me  that  the  design  mode  allows  computational  time  problems  to  be  dealt 
with  especially  easily.  It  is  quite  straightforward  to  add  more  paralleling, 
more  pipelining,  and  more  hardware  as  required  without  starting  over.  Another 
very  interesting  observation  was  made  when  the  design  was  completed.  Despite 
the  decidedly  dedicated  nature  of  the  design,  the  end  product  is  by  any  reason- 
able definition  a general  purpose  computing  machine.  It  does  have  peripherals 
which  are  specifically  geared  for  the  vocoding  process  such  as  pre-sample  fil- 
tering, special  format  data  storage,  and  acquisition  systems.  However,  the 
machine  itself  is  quite  general  purpose.  It  can  do  addition,  subtraction,  and 
multiplication.  It  has  DO  LOOPS  and  conditional  and  unconditional  jumps.  It 
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can  perforin  conditional  operations  and  make  decisions  as  a result  of  some  opera- 
tions. It  also  has  been  programmed  to  implement  a self-diagnostics  program,  a 
task  totally  different  from  the  vocoding  algorithm. 

When  faced  with  the  need  for  a machine  to  implement  a vocoder  or  a task  of 
similar  complexity,  it  is  perhaps  natural  to  draw  on  the  extensive  past  exper- 
ience accumulated  by  general  purpose  computer  designers.  This  leads  directly 
to  a simple  ^'classical*'  architecture  based  on  familiar  design  concepts  and  is 
capable  of  achieving  fastest  possible  machine  cycle  times,  a fact  frequently 
quoted  as  its  justification.  Implementations  along  these  lines  can  easily 
achieve  execution  times  per  instruction  a factor  of  5 or  so  shorter  than  those 
of  the  currently  proposed  machine.  Complex  signal  processing  operations  are 
then  carried  out  by  very  complex  and  lengthy  software. 

Perhaps  the  single  most  interesting  result  to  emerge  from  this  project 
is  the  fact  that  substantially  greater  total  program  throughput  rate  may  be 
achieved  by  settling  for  a slower  basic  cycle  time,  but  concentrating  instead 
on  making  each  instruction  as  powerful  and  efficient  as  possible.  This  claim 
is  borne  out  by  the  fact  that  at  this  writing  no  full-duplex  channel  vocoder  of 
similar  complexity  has  been  implemented  digitally,  despite  a keen  interest  in 
such  devices,  other  than  the  machine  described  here.  It  is  believed  that  for 
tasks  involving  signal  processing  or  filtering  where  maximum  throughput  of  mass 
real  time  data  is  the  keynote  the  proposed  approach  will  yield  a more  efficient 
end  product.  The  chief  contributing  factor  is  extensive  use  of  ’’firmware." 

This  term  means  PROM  implemented  special  functions  (like  log  tables  for  example) 
which  as  a result  can  be  recalled  with  a single  command  (not  unlike  a sub- 
routine in  a Fortran  program).  This  greatly  shortens  the  required  software  and 
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correspondingly  speeds  execution.  This  development  could  not  have  been  possible 
without  the  recent  introduction  of  LSI  and  large  PROMs  since  they  make  it  pos- 
sible to  realize  custom  parallel  architecture  and  fast  look  up  tables  with 
comparatively  little  design  effort,  at  small  cost  in  power,  size,  parts  count 
and  ultimately  dollars. 

After  the  final  design  is  completed,  debugged  and  working  it  is  almost  in- 
evitable that  the  question  comes  up  "If  it  had  to  be  done  again  would  I do  it 
the  same  way?”  Equally  inevitably  the  answer  is:  ”not  quite.”  This  case  is  of 
course  no  exception.  Following  is  a number  of  comments  arrived  at  by  hindsight. 
They  may  prove  useful  to  anyone  faced  with  a similar  design  problem. 

The  difficulty  in  creating  a sufficiently  flexible  addressing  scheme  was 
underestimated.  As  a result  the  part  of  the  machine  dealing  with  this  problem 
was  underdesigned  at  first  creating  a lot  of  headaches  later  on.  All  this 
would  have  been  avoided  if  addressing  received  greater  attention  at  the  very 
outset.  If  this  had  been  done  it  is  also  very  likely  that  a better,  more 
flexible  addressing  scheme  would  have  evolved. 

Difficulty  of  debugging  rises  exponentially  with  the  number  of  ICs  used. 

It  is  therefore  very  important  to  include  eventual  debugging  procedures  into 
the  design  process.  This  may  raise  the  IC  count  slightly,  but  will  repay  it- 
self manyfold  later  on.  One  very  attractive  way  to  do  this  is  to  set  aside  a 
reasonably  large  part  of  the  program  memory  for  a self  diagnostic  routine. 

This  would  exercise  all  possible  machine  modes  one  at  a time  (if  feasible)  or 
jointly  by  operating  on  some  predetermined  numbers.  During  each  operation  the 
output  of  the  arithmetic  section,  for  example,  is  monitored  by  comparing  it 
with  a precomputed  value.  The  whole  procedure  should  be  programmed  in  such  a 
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a way  that  if  a disagreement  is  detected  the  error  will  have  originated  in  only 
a small  part  of  the  machine.  In  this  way  the  detection  of  several  errors  should 
pinpoint  the  malfunctioning  of  individual  ICs.  In  the  present  machine  self 
diagnostics  was  added  at  the  end,  and  was  available  only  by  connecting  a sepa- 
rate specially  designed  board.  This  was  due  mainly  to  the  non-availability  of 
sufficiently  large  ROMs.  The  diagnostics  unfortunately  created  problems  on 
its  own  mainly  due  to  propagation  delays.  It  is  felt  that  self-diagons tics 
should  be  an  integral  and  permanent  part  of  the  machine.  If  the  machine  is  to 
be  a subsection  of  a much  larger  system  composed*  of  other  programmable  machines 
each  with  its  ovm  self-diagnostics  all  of  these  diagnostic  routines  could  be 
tied  together  making  it  possible  to  debug  even  very  large  systems  with  relative 
ease  and  in  a very  small  fraction  of  the  time  it  would  take  otherwise.  Since 
self-diagnostic  is  essentially  an  exercising  of  the  machine  itself,  it  is  esti- 
mated that  its  implementation  should  raise  the  IC  count  by  not  more  than  5%. 

As  already  mentioned,  the  advocated  design  procedure  is  only  possible  due 
to  the  introduction  of  LSI.  With  the  appearance  of  even  more  complex  LSI  modules 
the  process  becomes  more  flexible  still.  A good  example  of  this  would  be  the 
use  of  the  AM  2901  module.  This  is  a 40-pin  device  containing  an  ALU,  shift 
registers,  buffers,  ROM  and  RAM  memory  arranged  to  create  very  primitive  arith- 
metic operations  under  control  of  the  ROM  which  is  already  micro-programmed. 

A unit  like  this  could  have  been  used  to  advantage  in  both  the  addressing  and 
arithmetic  sections.  Another  example  would  be  the  incorporation  of  array 
multipliers,  possibly  custom  designed,  arranged  on  a single  chip. 


8 


II.  THE  VOCODTNG  ALGORITHM 


Essentially  the  program  consists  of  four  distinct  groups. 

2.1  Spectrum  analysis  and  real-time  pitch  computation. 

The  spectrum  analysis  operation  extracts  the  energy  content  of  16  band- 
pass filters  fed  by  the  speech  samples.  The  sampling  period  used  was  140  ^jsec , 
or  a rate  of  7.14  KHz.  Let  us  assume  x(n)  represents  the  current  speech  sample 
digitized  to  (12  bits  in  c>ur  case),  and  x(n-l)  the  previous  one.  The  first  set 
of  computations  are  49  bandpass  poles  defined  by  the  following  difference 
equations : 

y.(n)  = k[2  v. (n-1)  - x(n-l) ] - y . (n-2)  + x (n)  (1) 

1 111  z 1 

where 

i = 1,  2...,  49 

The  k^^  are  49  distinct  constants,  while  k^  is  a single  constant  equal  for  all 
49  filters.  The  y^(n),  y^(n“l)  and  y^(n-2)  are  current,  past  and  past  twice  re- 
moved filter  outputs.  Digital  filters  of  this  type  are  referred  to  as  recursive. 
It  is  important  to  note  that  they  contain  by  virtue  of  y^(n-l)  and  y^(n-2),  com- 
putational feedback.  A system  of  this  kind  may  lead  to  instability  especially 
in  fixed  point  arithmetic  machines.  This  problem,  together  with  possible  solu- 
tions will  be  discussed  later. 

The  k^^  and  k^  are  respectively  given  by 

kii  = r cos  (B^T)  (2) 

and 

2 

k^  = r = exp(-aT) 

where  T = sample  period  (140  ysec),  a = real  part  of  the  complex  pole  pair 
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(2tt  X 60  Hz),  3^  = complex  part  of  the  pole  pair.  Table  2.1  gives  the  values 

of  ^ summary  of  the  relevant  features  of  all  49  pole  pairs.  It  will  be 

seen  that  the  poles  lie  on  a line  parallel  to  the  imaginary  axis.  Furthermore, 

their  positions  have  been  chosen  such  that  if  combined  in  the  manner  to  be 

described  below,  the  bandpass  filters  formed  will  approximate  a linear  phase 

4 

characteristic.  Filters  of  this  type  are  referred  to  as  Lerner  filters. 

The  49  bandpass  poles  are  summed  into  16  sets  according  to  the  pattern 
shown  in  Table  2.2  Each  pole  has  a weight  attached  to  it  which  starts  with  0.5 
for  the  first  and  then  continues  with  alternating  sign  but  of  unit  magnitude  to 
end  up  with  0.5  again  on  the  last  pole  for  an  odd  number  of  poles  and  -0.5  for 
an  even  number.  Thus,  the  first  output  is: 

fj^(n)  = |0.5  y^(n)  - y2(n)  + y^(n)  - 0.5  y^(n)|  (4) 

and  the  eleventh 

f^^(n)  = |0.5  723^^)  ^24^^^  725^^^! 

(5) 

The  envelopes  of  the  rectified  bandpass  outputs  f^(n)  are  lowpass  filtered 
using  a third-order  transitional  Gaussian  to  12  dB  characteristic  cutting  off  at 
35  Hz.  Here  the  design  differs  from  the  original  where  a Bessel  filter  was  used. 
The  Gaussian  characteristic  has  better  step  response  characteristics  and  was 
therefore  chosen  here.  The  filters  are  realized  in  two  steps.  A first  order 
section  whose  output  is: 

r^(n)  = f [2  r^(n-l)  - 2 (n-1)]  + f^(n)  (6) 

is  followed  by  a second  order  section: 
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TABLE  2,1 

LERNER  POLE  POSITIONS 


Pole  Real 

Designation  Coordinate 


1 

60 

2 

60 

3 

60 

4 

60 

5 

60 

6 

60 

7 

60 

8 

60 

9 

60 

10 

60 

11 

60 

12 

60 

13 

60 

14 

60 

15 

60 

16 

60 

17 

60 

18 

60 

19 

60 

20 

60 

21 

60 

22 

60 

23 

60 

24 

60 

25 

60 

26 

60 

27 

60 

28 

60 

29 

60 

30 

60 

31 

60 

32 

60 

33 

60 

34 

60 

35 

60 

36 

60 

37 

60 

38 

60 

39 

60 

40 

60 

41 

60 

42 

60 

43 

60 

44 

60 

45 

60 

46 

60 

47 

60 

48 

60 

49 

60 

Imaginary 

(Hz)  Coordinate  (Hz) 

160 

200 

280 

320 

400 

440 

520 

560 

640 

680 

760 

800 

880 

920 

1000 

1040 

1120 

1160 

1240 

1280 

1360 

1400 

1480 

1560 

1600 

1680 

1760 

1800 

1880 

1960 

2040 

2080 

2160 

2240 

2320 

2400 

2440 

2520 

2600 

2680 

2760 

2800 

2880 

2960 

3040 

3120 

3200 

3280 

3320 


Resonant 

Frequency  (Hz)  ^ 


170.9 

1.42 

208.8 

1.74 

286.4 

2.39 

325.6 

2.71 

404.5 

3.37 

444.1 

3.70 

523.5 

4.36 

563.2 

4.69 

642.8 

5.36 

682.6 

5.69 

762.4 

6.35 

802.3 

6.69 

882.0 

7.35 

921.9 

7.68 

1001.8 

8.35 

1041.7 

8.68 

1121.6 

9.35 

1161.6 

9.68 

1241.4 

10.35 

1281.4 

10.68 

1361.3 

11.34 

1401.3 

11.68 

1481.2 

12.34 

1561.2 

13.01 

1601.2 

13.34 

1681.2 

14.01 

1761.0 

14.68 

1801.0 

15.01 

1881.0 

15.68 

1961.0 

16.34 

2040.9 

17.01 

2080.9 

17.34 

2160.8 

18.01 

2240.8 

18.67 

2320.8 

19.34 

2400.8 

20.01 

2440.7 

20.34 

2520.7 

21.01 

2600.7 

21.67 

2680.7 

22.34 

2760.6 

23.01 

2800.6 

23.34 

2880.6 

24.01 

2960.6 

24.67 

3040.6 

25.34 

3120.6 

26.01 

3200.6 

26.67 

3280.6 

27.34 

3320.5 

27.67 
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a,  and  are  constants  computed  to  give  the  required  characteristic. 

The  outputs  C^(n)  give  the  energy  in  the  16  channels  and  form  the  output 


of  the  spectrum  analysis  section.  Much  of  the  basic  bandwidth  compression  of 
the  vocoder  has  occurred  at  this  point.  The  original  sampled  speech  in  a band- 
width of  about  3 KHz  is  now  represented  by  16  abstracted  spectral-energy  functions 
whose  total  bandwidth  is  just  16(35)  = 560  Hz. 

The  pitch  extraction,  both  real  time  and  non-real  time  is  described  in 
great  detail  in  Ref.  1.  Thus,  only  those  parts  necessary  to  illustrate  the  com- 
putational structure  will  be  discussed  here.  The  purpose  of  the  real  time  pitch 
computation  is  to  make  preliminary  estimates  of  the  pitch  period.  This  is  done 
on  samples  of  only  the  bottom  900  Hz  since  it  is  known  that  the  fundamental 
pitch  period  will  never  be  in  excess  of  900  Hz.  The  actual  current  sample  is 
labelled  » the  previous  sample  once  removed  x^  and  the  sample  twice  removed 

x^_^.  For  computational  purposes  however  x^  is  treated  as  the  current  sample 
thereby  making  the  immediate  future  and  the  immediate  past  sample.  A 

parameter  A is  now  defined  by  (See  Fig.  3 for  flow  diagram). 


A = 0 


o 


+ 1 if  X 


n+1 


> X 


n 


A 


n+1 


(8) 
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A 


n+1 


is  a positive  peak 
there  is  no  peak 

is  a negative  peak 


When  peaks  are  detected,  their  magnitudes  are  stored  in  and  i.e.,  if 

A - A =2,  the  content  of  the  current  positive  peak  P is  shifted  into  th 
n+1  n ^ ^ cp 

past  positive  peak  storage  location  labelled  P^^  and  x^  is  written  into 

Similarly  for  a negative  peak  P ^ is  shifted  into  the  past  negative  position 

P and  X is  placed  into  P . For  the  majority  of  cases  when  no  peak  is 
pn  n cn 

detected  the  contents  of  P , P P and  P are  not  disturbed.  At  any 

cp  cn,  pp  pn 

one  sample  time  magnitudes  of  current  and  past  positive  and  negative  speech 
waveform  peaks  are  available.  Six  parameters,  defined  in  Eq.  (9)  are  next 
formed. 


m = 1 

p 

1 

m_  = 

|P 

1 

cp 

3 

cn 

m.  = 

|P 

- P 1 

m.  = 

|P 

- P 1 

1 ' 

cp 

pn* 

4 

' cn 

PP 

p 

“ P 1 

if 

P > P 

m_  = 

|p 

- P 1 

if  P > P 

2 ' 

cp 

PP 

cp  - pp 

5 

' cn 

pn  ‘ 

cn  — pn 

s 

0 

for  P 

< 

P 

= 

0 

for  P 

< P 

cp 

PP 

cn 

pn 

The  detailed  rationale  for  the  above  choice  of  measurements  is  described 
in  Ref.  1.  Basically,  if  time  intervals  between  them  are  measured,  m^,  m^,  m^ 
and  m^  give  a good  indication  of  the  period  for  wave  shapes  with  a strong  fun- 
damental component  present,  m2  and  m^  provide  a correct  period  for  strong 
second  harmonics  and  only  some  fundamental  component  waves.  This  information 
is  extracted  by  the  following  procedure. 


For  each  m whenever  a new  m.  is  computed  for  a time  T.,  called  the 
i 1 1 

blanking  period,  no  computations  are  performed,  then  a parameter  is  computed 
given  by 


oc,  = m X exp(-N  £n2/P  ) (10) 

1 1 ^ av^ 

where  N is  effectively  zero  during  blanking  and  is  then  incremented  by  unity 

every  sample  period,  ot,  therefore  represents  an  exponential  run-down  which 

reaches  half  its  original  value  when  N = P . Updating  of  a.  stops  when  a new 

av  1 

m is  found  which  is  not  less  than  the  current  value  of  a. . The  time  in  multiples 
i 1 

of  the  sampling  period,  from  the  beginning  of  the  blanking  interval  until  the 

current  cessation  of  the  a rundown  is  stored  in  P.  which  in  turn  defines  all 

i 1 

the  above  parameters  thus : 


P 

old 


new 


+ V 


T 


i 


0.4  P 

av 

new 


(11) 


The  flow  diagram  for  the  above  is  shown  in  Fig.  1.  The  six  P^  are  the 
initial  pitch  estimates.  Blanking  and  run-down  procedures  are  helpful  in  re- 
ducing spurious  very  short  pitch  period  estimates  and  those  produced  by  noise. 
Three  sets  of  P^  are  kept  in  memory;  the  current  set  and  the  two  most  recent 
past  sets. 

2.2  Non-Real  Time  Pitch  Computation 

The  remainder  of  the  pitch  extraction  process  depends  only  on  the  computed 
18  values  of  P^.  Also  since  a new  pitch  estimate  has  to  be  made  only  once  every 
18  msec  (this  corresponds  to  slightly  over  71  sample  periods) , the  computation 
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Fig.  1.  Real’-time  pitch  computation  flow  diagram. 
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from  now  on  can,  within  broad  limits,  be  done  whenever  convenient.  It  does  not 


have  to  be  finished  during  any  one  sample  period.  It  may  be  spread  out  over 
several  or  done  all  at  once  every  10  msec.  The  term  "non-real  time  pitch  com- 
putation" may  be  misleading  since  the  operation  does  result,  in  conjunction  with 
the  rest  of  the  algorithm,  in  real  time  speech  processing.  The  name  merely 
designates  those  parts  of  the  computations  which  do  not  have  to  be  performed 
every  input-sample  period.  The  approach  adopted  here  was  to  do  the  whole  non- 
real-time  pitch  computation  all  at  once  every  10  msec.  This  simplifies  control, 
since  the  operation  does  not  have  to  be  interrupted. 

The  computation  consists  of  arranging  a table  of  period  estimates  and  then 
choosing  the  most  likely  candidate.  At  the  same  time  attention  is  also  directed 
to  the  energy  of  the  signals  involved.  If  a certain  threshold  level  is  not  ex- 
ceeded, the  samples  are  assumed  to  be  caused  by  background  noise  and  not  speech. 
They  are  therefore  labeled  for  what  they  are  despite  any  possible  detected  pe- 
riodicity that  may  be  associated  with  them.  If  the  energy  level  is  exceeded 
but  the  dispersion  of  the  pitch  estimates  is  large  (e.g.,  no  two  estimates  are 
alike),  the  speech  sample  may  in  fact  have  no  defined  pitch  at  all  as  in  an 
incoherent  sound  like  "s."  Both  of  the  above  cases  are  "Hiss."  In  all  other 
cases  a 7-bit  word  representing  the  most  likely  pitch  period  (referred  to  as 
"Buzz")  as  a multiple  of  the  sampling  period  is  found.  The  above  process  is 
commonly  called  the  Buzz-Hiss  or  voiced-unvoiced  decision.  Table  2.3  gives 
some  examples  to  further  clarify  the  procedure. 

The  word  used  to  represent  Hiss  is  0000  111.  Thus  if  the  energy 
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TABLE  2.3 


Pitch  Word 

Decimal 

Equivalent 

Pitch 

Comments 

Period  = 140  ysec  x Pitch 
Word 

Frequency 

0 0 0 0 0 0 0 

0 

1 

\ 

1 

Hiss 

0 0 0 0 1 1 1 

7 

0.98  msec 

1020.4  Hz 

! 

1 

0 0 0 1 0 0 0 

8 

1.12  msec 

892.9  Hz 

\ 

0 0 0 1 1 1 1 

15 

2. 10  msec 

476.2  Hz 

0 0 1 0 0 0 0 

16 

2 . 24  msec 

446.4  Hz 

True 

0 0 11111 

31 

4.34  msec 

230.4  Hz 

\ 

. Pitch 

0 1 0 0 0 0 0 

32 

4.48  msec 

223.2  Hz 

1 

Estimate 

0 111111 

63 

8.82  msec 

113.4  Hz 

1 0 0 0 0 0 0 

64 

8.96  msec 

111.6  Hz 

1111111 

127 

17 . 78  msec 

56.2  Hz 

i 

threshold  is  not  exceeded  or  the  dispersion  is  too  large,  this  word  is  put  out  for 

transmission  as  the  current  pitch  estimate. 

The  computations  involved  here  are  as  follows:  Let  the  current  6 be 

designated  as  column  1,  the  one  preceeding  this  column  2 and  the  one  before 

th  th 

that  as  column  3.  Also  let  denote  the  entry  in  the  i column  and  j row. 

A 6 X 6 matrix  is  now  formed  which  includes  besides  the  above  3 as  columns  1, 

2 and  3 also  the  following: 
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(12) 


P = P + P 
^4j 

P,.  = P„.  + P„. 

5j  2j  3j 

P, . = P, . + P-.  + P„. 

6j  Ij  2j  3j 

The  reason  for  these  rows  is  that  for  waves  rich  in  harmonics  the  original 

estimates  P_  . to  may  erroneously  detect  second  or  third  harmonics.  For  such 
Ij  3j 

cases  to  P^^  are  more  likely  to  give  the  correct  pitch  estimate.  Next  a 

set  of  window  functions  for  each  entry  of  the  first  column  is  defined  as 


Wk(Pij)  = k X 6.25%  of  P 


ij 


(13) 


where 


= 0.0625  k P^ . 

ij 


k = 1 , 2 , 3 and  4 - 


A score  NC  , is  then  incremented  by  unity  every  time  if 
qk 

For  each  of  the  6 q values  all  i and  j are  used.  The  score  is  augmented  by  a 
bias  term  BT^  (where  BT^  = 8,  6,  3,  and  1 for  k = 1,  2,  3 and  4,  respectively) 


giving 


" c’f  =*  + BT,  .'  ' ' ‘ • 

qk  qk  k 


(15) 


The  P^^  resulting  in  the  largest  ^Imax  then  compared  with  a thres- 

hold term  CT.  The  pitch  to  be  transmitted 


P = P 
TR  Imax 


if  P,  > CT 
Imax 


(16) 


= Hiss  word  if  F.  < CT 
Imax  — 


The  threshold  term  CT  = 13. 
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The  process  just  described  will,  with  reasonably  high  probability,  give  a 
correct  pitch  estimate.  Unfortunately,  errors  are  also  inevitable.  Their  ef- 
fect can,  to  a noticeable  degree,  be  eliminated  if  data  smoothing  is  employed 
on  the  series  of  final  pitch  estimates.  This  process  removes  rapid  alterations 
between  Buzz  and  Hiss  and  results  in  the  smoothing  out  of  implausibly  rapid 
time  variations  in  the  pitch  estimates.  The  salient  features  of  this  proce- 
dure follow. 

At  the  end  of  the  pitch  evaluation,  after  the  Buzz-  Hiss  decision  has  been 
made,  a shift  register  is  loaded  with  a 1 if  the  current  pitch  word  is  Buzz  and 
a 0 if  it  is  Hiss.  Using  an  8-bit  register,  the  decision  for  the  past  8-pitch 
words  is  stored.  The  low-pass  filters  in  the  spectrum  analysis  introduce  a 
60-msec  delay  in  the  spectrum  data.  The  pitch  data  on  the  other  hand  is  delayed 
at  most  10  msec.  So,  in  order  to  time  align  pitch  and  spectrum  in  formation, 
the  current  spectrum  information  should  be  combined  with  the  pitch  word  computed 
50  msec  ago,  i.e.,  5 final  pitch  outputs  of  delay.  The  5th-bit  in  the  above 
pitch  register  then  represents  the  pitch  word  of  current  interest.  Let  the  8 
entries  in  the  register  be  denoted  by  X^.  The  first  smoothing  is  done  over  ad- 
jacent sets  of  3 X^.  The  center  entry  is  always  altered  to  the  majority.  Thus, 
for  example, 

X._^  Majority 

Oil  1 hence  no  change,  X^  « 1 

0 1 0 0 X^  is  changed  to  0 

The  decisions  are  made  on  overlapping  sets  of  3,  moving  one  X^  down  at  a time. 

A second  set  of  eliminations  is  done  on  the  results  of  the  first  but  now  over 
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a set  of  five.  Again  the  center  is  changed  according  to  the  majority.  The 
overall  pattern  is  shown  below: 


It  can  be  shown  that  this  pattern  is  representable  by  the  following  logic 
expressions : 


and 
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the 
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process  ensues: 
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Decimal 

Equivalent 

V 

L_!5 

Editing  Procedure 

0 

0 

0 

Transmit  Hiss  word  as  computed. 

1 

0 

1 

Transmit  Hiss  word  despite  computed  Buzz. 

2 

1 

0 

Find  Median  over  X2  Xg  and  transmit 

as  new  Buzz. 

3 

1 

1 

Transmit  Buzz  word  as  computed. 

The  first 

two 

and  the 

last  procedures  are  self  explanatory. 

the  third  one 

needs  more  elaboration.  If  originally  = 0,  Hiss  would  have  been  the  decision. 
However,  =1  indicating  that  a Buzz  word  is  needed.  None  is  available  so 

a new  one  has  to  be  derived  from  neighboring  ones.  It  appears  that  for  editing 
of  this  type,  medians  are  an  optimal  choice.  A median  is  defined  as  that  value 
of  a distribution  for  which  half  its  members  are  smaller  and  half  larger.  Since 
the  median  estimation  is  done  with  actual  pitch  values,  8 past  pitch  estimates 
(denoted  by  in  flow  diagram)  have  to  be  also  stored. 

In  order  to  estimate  what  will  happen  in  the  future  as  well  as  in  the  past, 
72  samples  x(n)  of  the  speech  input  are  stored  in  FIFO  memory.  Samples  coming 
through  the  analog  pitch  channel  are  clocked  into  a buffer  directly  (the 
px(n  + 72)  buffer).  The  current  samples  to  be  used  for  computation  are  those 
extracted  from  FIFO  memory,  therefore,  the  px(n  + 72)  buffer  contains  the  value 
of  pitch  72  sample  periods  into  the  future.  During  every  sample  period. 


and 


X 

max 


px(n  + 72) 
X 

max 


if  px(n  + 72)  > X 

max 

otherwise 


X , = px(n  + 72) 

min  ^ 

= X . 
min 


if  px(n  + 72)  < 
otherwise 


(18) 
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During  the  non-real  pitch  computation 


A = lx  - X I 
1 ' max  max ' 


(19) 


Also,  X is  reset  to  00... 0 and  X , to  01111. ..1.  In  this  way  during  a 
max  min 

current  P computation  gives  maximum  deviation  of  pitch  samples  10  msec 
iK  1 

into  the  future,  whereas  A2  provides  the  same  information  10  msec  into  the  past. 

The  greater  of  the  two  values  A^  and  is  now  compared  to  a threshold  level 

(best  value  here  is  found  empirically).  If  this  level  is  not  exceeded  is 

i R 

made  equal  to  the  Hiss  word,  otherwise  Buzz  computation  is  pursued. 

A flow  graph  of  non-real  time  pitch  extraction  is  shown  in  Fig.  2. 

2.3  Synthesis 

The  purpose  of  this  operation  is  to  reconstruct  synthetic  speech  from  the 
received  pitch  and  spectrum  data. 

The  received  data  contains  two  7-bit  words  representing  pitch  and  35  bits 
representing  spectrum  energy  approximately  every  20  msec.  More  information 
about  the  format  and  timing  of  the  2400  bit/sec  data  stream  transmitted  from 
analyzer /pitch  detector  to  synthesizer  over  a digitial  transmission  link  will 
be  given  in  Section  2.4.  The  pitch  words  are  used  to  control  the  period  of  a 
digital-equivalent  impulse  train.  If  Hiss  is  received,  +1,-1  impulse  pairs 
are  output  with  period  selected  by  a random  number  generator.  These  impulses 
are  used  as  inputs  into  a cascade  of  (usually)  3 or  4 second-order  difference 
equations  whose  aim  is  to  approximate  the  vocal  tract  impulse  response  during 
vowel  production.  Functions  of  this  kind  are  referred  to  as  formant  filters. 
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Fig. 


2.  Non-real  pitch 


computation  flow  graph. 
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They  also  have  a ’’smearing  out"  effect  on  the  sharp  impulses  used  as  inputs. 

For  more  information  see  Ref.  2.  The  computational  form  of  the  formant  filters 
is  identical  to  that  given  in  Eq.  (7).  The  constants,  however  are  different. 

The  original  is  now  designated  and  varies  for  each  of  the  formant  sec- 
tions. The  original  k^  is  changed  to  k^  and  is  the  same  for  all  sections. 

The  formant  filter  output  is  the  input  into  49  band-pass  poles  identical 
to  the  ones  used  in  the  analysis  section.  Their  outputs  are  weighed  and  summed 
also  in  the  same  way;  however,  no  moduli  (envelopes)  are  taken.  Let  the  16  re- 
sults be  denoted  by  A^(n).  The  spectrum  information  extracted  from  the  received 
data  is  decoded  and  converted  into  a 16-element  vector.  Each  entry  is  passed 
through  a third  order  low-pass  filter  of  the  type  described  in  the  analysis 
(see  Figs.  6 and  7)  giving  16  outputs  B^(n) . The  A^(n)  are  used  to  modulate 
the  B^(n)  to  give  the  16  E^(n)  according  to  the  following  scheme: 


E.(n)  = ^ B.(n)  if  A . (n)  > 0 

1 io  1 1 — 


B^(n)  if  A.(n)  < 0 


^ B. (n)  if  A^(n)  > 0 


77-  B.  (n)  if  A.  (n)  < 0 
16  1 1 


for  i 
odd 


(20) 


for  i 
even 


Another  set  of  band-pass  poles,  again  with  the  same  coefficients  as  in  the 
analysis  is  used  but  now  in  the  reverse  fashion.  Thus, 
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For  Band-Pass  Pole 


Input 


1 


2 


3 


4 


5 


etc . 


With  these  inputs,  the  first  25  band-pass  pole  outputs  G^(n)  are  computed. 
It  will  be  noted  from  Table  2,1  that  the  resonant  frequency  of  the  25th  pole 


pair  is  1601  Hz,  and  that  of  the  26th  1681  Hz.  Also,  as  will  be  discussed  in 
more  detail  later,  the  pre-sample  input  and  post-sample  output  analog  filters 
cut  off  at  3.3  kHz  with  very  sharp  rejection.  Thus,  the  second  harmonic  of  the 
26th  band-pass  pole  with  a center  frequency  3362  Hz,  lies  in  the  rejection  band 
of  the  analog  output  filter.  Therefore,  the  26th  and  up  to  the  49th  band-pass 
poles  contribute  nothing  to  the  synthesis  that  is  not  already  provided  by  the 
post-sample  output  analog  filter.  These  poles  may  therefore  be  neglected. 

The  final  output  is  then 


x*(n)  is  converted  into  an  analog  signal  via  a D/A  and  fed  into  the  voice  out- 
output  . 

2.4  Frame  Data  Encoding  and  Decoding 

The  data  extracted  from  the  analysis  and  pitch  computation  sections  is 


(21) 
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further  condensed  and  transmitted  in  49-bit  frames.  Since  the  data  rate  is 
2.4  kb/sec  a frame  duration  is  (49/2400  = 20.41667  msec.  There  are  two  7-bit 
pitch  words  per  frame,  representing  a pitch  computation  every  half  frame,  i.e. , 
every  10.20833  msec.  The  remaining  35  bits  are  used  to  represent  the  energy  in 
the  16-analysis  channels.  As  can  be  seen  from  Table  2.3,  7-bit  pitch  words  are 
adequate  to  cover  the  required  pitch  range.  Therefore,  no  further  coding  is 
needed  here.  However,  35  bits  every  20  msec  is  not  adequate  to  transmit  the 
spectrum  information.  The  analysis  section  supplies  16  words,  16  bits  each,  a 
total  of  256  bits  which  now  have  to  be  compressed  into  35  bits  with  a minimum 
sacrifice  in  information  content.  Even  if  4 or  5 bits  per  sample  with  logarith- 
mic quantization  was  used  (a  reasonable  approach) , we  would  still  have  64  or 
84  bits  per  frame,  too  many  by  about  a factor  or  2.  We  must  take  advantage  of 
the  high  degree  of  correlation  observed  to  exist  between  speech  spectral  samples. 

After  a great  deal  of  work,  a technique  was  found  some  years  ago  which  removes 

3 

some  redundancy  and  is  easy  to  implement.  The  16  spectrum  samples  S^(n)  say 
can  be  thought  of  as  elements  of  a 16-dimensional  vector  S^.  There  exists  a 
linear  transformation,  the  Hadamard  matrix,  with  elements  limited  to  +1  and 
-1,  which  transforms  S.  into  S.’,  i.e., 

= [H]  . (22) 

such  that  the  elements  of  are  arranged  by  decreasing  order  of  information 

content . 

As  noted  above,  logarithmic  rather  than  linear  quantization  can  be  used 
to  provide  maximum  dynamic  range  for  a given  number  of  bits.  This  follows  from 
the  empirical  observation  that  speech  perception  of  the  human  ear  is  roughly 
logarithmic  in  nature. 
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Thus,  the  encoding  of  spectral  samples  consists  of  taking  the  logarithm  of 
the  spectral  envelopes,  transforming  them  using  the  Hadamard  matrix,  and  trans- 
mitting 35  bits  of  information  about  the  result  with  a maximum  number  of  bits 
assigned  to  the  first  element  and  progressively  fewer  to  the  adjacent  ones. 

The  data  converted  to  a 2.4  kb/sec  serial  stream  is  transmitted  over  the  com- 
munications link.  At  the  receive  end,  the  35-spectral  bits  are  decoded  and  a 
receive  spectral  vector  RS|  is  formed.  Then 

RS^  « [h]  * RS^  . (23) 

It  can  be  shown  that  [h]  ^ where  0(H)  denotes  the  order  of  the 

H matrix,  in  this  instance  16.  The  antilogs  of  the  elements  of  RS^  then  give 
the  spectrum  inputs  into  the  synthesis  section. 

Details  of  the  encoding  and  decoding  process  are  given  next.  The  encode 
bit  lineup,  after  log  taking  and  Hadamard  transformation  is  shown  below: 


BITS 


Case 


A 

B 

C 

D 


A^ 


16 

0 

s 

s 

s 


A WORDS 


■ 

7-bit 

log 

1 

1 

; 11' 
1 

lo"' 

9 

8 

7 

6 

^5 

1 

1 

1 

X 1 
1 

n 

n 

n 

n 

n 

n 

1 

1 ^ “ 

information 

X . 

n 

n 

n 

n 

n 

n 

1 

1 n = 

unused  bits 

1 

X , 

n 

n 

n 

n 

n 

n 

1 

1 s = 

sign  bits 

1 

X 1 
^ 1 
1 

n 

n 

n 

n 

n 

n 

1 

1 

1 

1 

A 7-bit  logarithm  is  used.  After  going  through  the  Hadamard  transfor- 
mation the  top  line  (Case  A)  may  be  multiplied  by  up  to  16  (since  the  top  row 
of  the  Hadamard  matrix  consists  of  all  +l's).  Hence,  the  indicated  7-bit 
log  lineup.  All  other  rows  of  the  Hadamard  matrix  contain  an  equal  number 
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of  +1  and  -I’s,  so  the  maximum  shift  cannot  exceed  three  binary  places.  Bit 
15  therefore  is  the  effective  sign  bit  for  cases  B,  C,  and  D. 

Case  A:  Bit  15  is  a value  bit,  but  the  whole  word  is  known  to  be  positive. 

Since  this  case  corresponds  to  S^(n)  all  5 bits  through  A^  are 
used  to  describe  it . 

Case  B:  If  the  A word  is  positive  and  less  than  7,  the  sign  bit  (0  in  this 

case)  plus  the  3 bracketed  bits  are  used,  giving  the  representa- 
tion 0 A^  A^  Aq.  If  A is  positive,  but  greater  than  7,  0111,  i.e., 
7 is  used.  For  negative  numbers  if  larger  than  -8  the  4-bit 
representation  is  used  as  it  stands.  For  A less  than  -8,  just  -8, 
i.e.,  1000  is  chosen. 

Case  C;  The  same  truncation  principle  as  above  is  used  here*  except  now 

only  on  two  bits  A^  (the  sign  bit)  and  A^.  Thus  only  numbers  be- 


Case D: 


tween  01  and  10  are  generated. 


Only  the  sign  bit  A^  in  transmitted  for  this  set. 


The  16  elements  of  S|  are  then  assigned  to  the  following  groups: 


TABLE  2.4 


Element  of  Si 
1 

Case 

No . of  Bits  in 
Representation 

Total  No.  of 
Bits  Used 

^21 

^11 

S'(n) 

o 

A 

5 

5 

0 

0 

S|(n) ,S^(n) ,S^(n) ,S^(n) 

B 

4 

16 

0 

1 

S^(n),S^(n),S^(n) 

C 

2 

6 

1 

0 

S^(n) S|^(n) 

D 

1 

8 

Total  35  bits 

1 

1 
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The  values  and  are  merely  used  as  a digital  counter  to  distinguish 

the  four  different  cases.  Denoting  the  encoded  outputs  by  0^,  it  will  be  found 
that  Boolean  expressions  may  be  derived  for  them  in  terms  of  the  and 
^21^  These  are  given  below: 


0 = A 

o o 


»1  ■ =21 


»2  ■ =21 


“3  ■ =21 


»4  - S- 


=11  ■ 

^21 

|a,  ■ 

+ 

GO 

1 

>1 

1 1 

11 

11  L ^ 

Ia  • 

+ 

s.  . • A 

( 2 

11 

11  L ^ 

A^  • 

+ 

s. . - A 

3 

11 

11  L ^ 

^11 

>i 

• 

(A3  + A3 

4 o 3 

(At  + A„)  + a 


•A  - A, 
4 o 3 


3'  ' “4 

2 + A3)  + A^  ..2  "3 


]! 

'1  ■ ^3]) 


(24) 


The  encoded  words  to  be  transmitted  are  shown  in  Table  2.5  by  an  X.  N de- 
notes *’do  not  use.” 


TABLE  2.5 


Encode  Outputs 


Case 

°4 

°3 

CM 

0 

0 

0 

A 

X 

X 

X 

X 

r?--i 

B 

X 

X 

r 

X 

X 

; N 

C 

X 

N 

N 

N 

D 

X 

1 N 

N 

N 

N 

At  the  receive  end  the  serial  data  in  is  converted  into  an  8-bit  word 
and  lined  up  as  shown  in  Table  2,6 
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TABLE  2.6 


Decode  Inputs 


Case 

0P7 

OP  6 

0P5 

OP 

4 

0P3 

0P2 

OP^ 

A 

N 

N 

N 

X 

X 

X 

X 

X 

B 

N 

N 

N 

N 

X 

X 

X 

X 

C 

N 

N 

N 

N 

N 

N 

X 

X 

D 

N 

N 

N 

N 

N 

N 

N 

X 

The 

function  of  the 

decoding  is  to 

produce  a 

lineup  of  numbers  such  that 

dynamic 

ranges  are  maximized . 

This 

implies  using 

the  highest  possible  position 

closest 

to  the  sign 

bit  . 

At  the  same  time,  care  must  be  taken  to  insure  that 

the  process  to  follow  does  not  produce  an  overflow 

with 

these  numbers.  In  this 

case,  the  following  computation  is  the  inverse  Hadamard  transformation.  This 
implies,  in  the  first  row,  a summation  of  all  received  values.  If  we  now  assume 
that  0P-,  in  Table  2.6  is  equated  with  the  sign  bit,  then  OP.  to  OP^  represents 
the  first  through  seventh  value  bits.  For  Case  A,  all  N*s  will  be  0 since  it  is 
known  that  this  number  is  positive.  For  the  others  no  such  guarantee  exists, 
therefore  the  N*s  in  these  cases  have  to  be  assumed  to  be  sign  extensions.  Two 
extreme  cases  arise,  one  when  all  values  received  are  positive,  the  other  when 
they  are  all  negative.  For  the  first  case  the  largest  value  word  A can  have 
is  31.  Word  B can  be  7;  however,  there  are  four  of  these  contributing  a total  of 
28.  The  three  words  C can  contribute  a maximum  of  three,  and  D can  at  most  be 
0 and  it  does  not  contribute.  The  total  is  62.  However,  one  must  add  eight 
to  the  word  A.  This  follows  from  the  fact  that  in  2’s  complement  arithmetic 
truncation,  whether  the  number  is  positive  or  negative,  always  adds  an  error  in 
same  direction.  Since  in  the  construction  of  word  A only  additions  are  used 
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(16  of  them)  the  number  will,  on  average,  be  smaller  by  half  the  order  of  the 
transformation,  8 in  this  case.  So  the  total  maximum  positive  sum  for  all  re- 
ceived words,  using  the  lineup  in  Table  2.6  is  72.  Using  a similar  procedure 
for  an  all-negative  numbers  input  (A  can  be  only  zero  here)  and  adding  8 to  A 
gives  a minimum  of  -38.  Thus  7 bits  are  sufficient  to  represent  all  eventual- 
ities and  identification  of  OP^  with  the  sign  bit  is  exactly  right. 

III.  COMPUTATIONAL  FORMS  AND  STRATEGY 

As  indicated  at  the  beginning,  the  machine  we  built  was  designed  to  imple- 
ment the  computational  forms  produced  by  the  channel  vocoder  algorithms.  It 
was  of  central  importance  that  these  forms  be  executed  in  a minimum  of  time  and 
program  storage.  This  section  lists  the  required  computational  capabilities 
and  discusses  the  most  efficient  way  to  achieve  them. 

Let  us  begin  then  with  the  49  band-pass  poles.  The  equations  to  be  exe- 
cuted assume  the  form  given  in  Eq.  (1).  There  are  49  such  poles  in  the  analysis 
and  74  in  the  synthesis  operation,  a total  of  123.  If  each  line  of  program  is 
executed  in  one  basic  cycle  time  T^,  and  n lines  of  code  are  needed  per  band- 
pass pole,  and  execution  time  of  123n  T^  will  be  required.  Given  the  basic 
algorithm  and  a minimum  value  of  T^  which  the  hardware  is  capable  of,  minimizing 
n without  much  increasing  T^  is  the  only  possible  approach  to  minimizing  total 
execution  time.  This  implies  parallel  hardware.  An  array  multiplier  executing 
a complete  multiply  during  a single  T^  is  therefore  desirable.  Current  Shottky 
TTL  16  X 16  array  multipliers  can  be  made  to  give  a product  in  about  100  nsec 
(typical).  This  would  imply  an  upper  limit  of  10  MHz  on  the  machine  cycle  fre- 
quency. Assuming  for  the  moment  that  T^  = 100  nsec  each  unit  increase  in  n 
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adds  12.3  Usee  to  the  execution  time.  This  represents  nearly  10  percent  of  the 
total  140  ysec  sample  period  available  for  the  basic  spectrum  computation. 

The  machine  must  be  able  to  execute  the  adds  and  subtracts  in  parallel  with 
the  multiplies.  These  operations  must  be  capable  of  being  expressed  by  the  fol- 
lowing functional  equations. 

(Operand  1)  (operation)  (Operand  2)  ^ (Result  Destination)  (25) 

This  implies  three  simultaneous  addresses.  Assuming  at  least  8 bits  per 
address,  a minimum  of  24  bits  of  address  code  will  be  needed. 

The  two  multiplies  already  imply  two  cycle  times.  Since  an  addition  has 
to  be  performed  after  the  multiplies,  a minimum  of  three  T^  must  be  available. 
The  aim  therefore  will  be  to  build  the  machine  such  that  no  more  than  three- 
cycle  times  are  needed  for  each  second-order  iteration  of  the  type  shown  in 
Eq.  (1).  If  a hard  wired  DO  LOOP  mechanism  of  the  type  discussed  in  the  intro- 
duction is  used,  no  additional  increase  in  n due  to  looping  will  be  required. 

Equation  (2)  implies  the  need  for  a ROM  in  which  the  49  are  stored. 
Equation  (3)  requires  a mapping  of  X into  exp(-X),  Again  this  may  be  done  by 
a ROM.  The  next  computational  forms  appear  in  Eq.  (4):  scaling  by  half  and 

the  taking  of  a modulus.  The  fastest  way  of  achieving  a scaling  by  one  half  is 
to  use  a multiplexer  appropriately  wired.  The  taking  of  the  modulus  (absolute 
value)  may  be  done  as  follows:  let  us  assume  that  an  ALU  is  available  which 

shifts  the  input  through  to  the  output  on  one  of  its  channels  when,  for  example, 
the  code  word  x^^  X2  is  supplied  and  provides  the  inverse  of  the  input  when 
the  code  word  is  complemented,  i.e.,  becomes  x^  x^.  The  modulus  can  then 
be  implemented  by  connecting  the  x^  through  exclusive  OR’s.  The  other  free 
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input  into  the  exclusive  OR’s  is  the  sign  bit  of  the  input  word.  When  this  is 
positive,  the  sign  bit  is  zero  and  the  outputs  of  the  exclusive  OR*s  are  x^  x^ 
implying  a straight  shift  through.  For  negative  inputs,  the  sign  bit  is  one 
and  the  x^  will  become  inverted,  i.e.,  x^  X2  x^,  forcing  the  input  to  appear 
inverted  on  the  output.  This  process  takes  a single  and  can  be  initiated  by 
a single  command  (MOD  for  example) . 

Equations  (6)  and  (7)  do  not  impose  any  new  requirements.  The  scaling  by 
2 (incidentally  this  already  occurs  in  Eq . (1))  can  again  be  solved  by  the  use 
of  a multiplexer.  It  is  hoped  that  Eq.  (6)  would  take  only  two  program  steps 
since  only  one  multiply  and  one  subsequent  add  is  needed. 

Equation  (8)  requires  a 3-way  decision.  This  can  be  achieved  in  two  con- 
ceptually quite  different  ways.  The  machine  can  be  made  to  access  one  of  three 

different  addresses  depending  on  whether  x.^>x,x.-=x  orx,^<x  and 

^ ^ n+1  n n+1  n n+1  n 

write  1,  or  -1  into  ’ ^respectively . This,  in  addition  to  the  decision 

step,  requires  at  least  three  additional  program  lines  and  an  unconditional  GO 
TO  command.  If  pipe-lining  is  used  the  number  of  lines  required  may  double  or 
even  treble.  If,  however,  those  lines  can  be  used  for  other  essential  opera- 
tions as  well,  this  may  be  a very  acceptable  solution.  An  alternative  method 
is  to  use  an  approach  similar  to  that  proposed  for  the  modulus  function.  Thus, 
operation  A or  B is  performed,  depending  on  the  outcome  of  a comparison  between 
two  numbers.  Two-way  comparisons  are  reasonably  simple  to  implement.  The  three- 
way  case  becomes  much  more  difficult.  It  may,  of  course,  be  hardware  implemented. 
The  additional  complexity  this  introduces,  however,  was  deemed  too  high  a price, 
especially  since  two  lines  of  software  using  two-way  comparisons  can  be  used  to 
make  a three-way  decision.  The  following  three  single-line  software  capabilities 
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were  developed: 


1. 

COMP (A  > 

B) 

IF 

MET 

X 

o 

Y); 

IF 

NOT  MET(X  OP2  Z) 

2. 

COMP (A  = 

B) 

IF 

MET 

X 

o 

T); 

IF 

NOT 

MET 

(X  OP2  Z) 

3. 

COMP (A  < 

B) 

IF 

MET 

(X  OP^ 

T); 

IF 

NOT 

MET 

(X  OP2  Z) 

Any  of  the  X,  Y or  Z can  be  either  of  the  two  comparison  words  A or  B. 
Operations  OP^ , OP^  may,  but  do  not  have  to  involve  the  two  numbers.  Thus,  they 
can  be  an  addition  or  subtraction  involving  say  X and  Z,  alternatively,  OP^  could 
specify  a shift  through  whilst  OP^  a negation  of  X.  Such  an  operation  could  be 
used  to  implement  the  modulus  function  discussed  above.  The  instruction  for  this 
would  be: 

COMP(X  < 0)  IF  MET  (SHIFT-X) ; IF  NOT  MET  (SHIFT  X)  . (27) 

Despite  the  introduction  of  these  very  powerful  instructions,  the  modulus  in- 
struction was  retained  as  a separate  entity.  This  enhances  programming 
flexibility. 

The  flow  diagrams  of  Eq.  (8)  is  shown  in  Fig.  3.  It  takes  two  dedicated 
lines  of  software  to  program.  A write  into  ^ third  line  is  also  needed, 

however,  this  can  be  shared  with  other  operations. 

Equation  (9)  needs  no  new  facilities.  The  MOD  and  COMP  instructions  will 
give  single  line  software  for  their  implementation.  Similarly,  computational 
forms  needed  for  Eqs.  (10)  to  (16)  are  already  available.  Equation  (17)  are 
Boolean  expressions.  One  can  either  utilize  the  logic  expression  facilities  of 
available  ALUs  or  (if  the  number  of  lines  involved  is  not  too  large)  use  one  or 
more  ROMs  programmed  to  give  the  required  expressions.  In  the  present  case  8 
input  lines  and  one  output  is  required,  making  it  ideal  for  a single  ROM 
realization. 
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Fig.  3.  Flow  diagram  for  computation  of  A 
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on  whether  X , , > X , X , , 
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X or  X ^,  < X , 
n n+1  n 


+1 , A^  or  —1  depending 
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Nothing  new  is  required  again  for  Eqs.  (18)  and  (19).  Equation  (20)  could 
be  implemented  in  software.  All  the  even  A^(n)  collected  in  an  array  AE^(n) 
say,  used  in  a DO  LOOP  containing  one  COMP  instruction  and  one  multiply  (by  1/16) 
would  give  the  even  E^(n).  Similarly  for  the  odd  E^(n).  This  would  take  some 
six  lines  of  coding.  It  appears,  however,  that  with  very  little  extra  hardware 
a single  line  command  to  compute  the  E^(n)  from  the  A^(n)  can  be  obtained.  This 
is  based  again  on  the  principle  of  complementing  commands  for  opposite  functions. 
Depending  on  whether  the  most  significant  bit  out  of  a storage  register  (desig- 
nated ACCA  for  accumulator  A)  containing  A^(n)  is  0 on  1,  a hard  wired  1/16  B^(n) 
is  either  shifted  through  directly  or  inverted.  This  is  done  with  or  without 
an  additional  sign  reversal  under  control  of  the  least  significant  bit  of  the 
DO  LOOP  counter  which  is  0 for  even  and  1 for  odd  consecutive  passes.  This 
command  is  designated  DOIB  for  Direct  or  Inverse  through  accumulator  B.  The 
hardware  implementation  only  requires  the  addition  of  a few  gates. 

The  next  Equation,  (21),  again  does  not  require  any  new  features.  Equa- 
tion (22),  however,  is  a matrix  operation  and  would  therefore  imply  a series  of 
multiplications.  Fortunately  H is  composed  entirely  of  +1  and  -1  and  only  ad- 
ditions and  subtractions  are  needed.  Also  any  one  entry  of  H,  h^^  is  given  by 


(28) 


This  represents  an  8 input  (4i  and  4j)  values  to  one  output  transforma- 

tion. So  a single  ROM  can  give  all  h^^  values.  A command  HAD  inside  DO  LOOP 
takes  over  addition  and  subtraction  under  control  of  h^^  and  gives  a single  line 
realization.  Sixteen  runs  through  this  DO  LOOP  give  one  line  of  S.  HAD  will, 
of  course,  be  also  used  for  the  inverse  Hadamard  transformation. 
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Finally,  Eq.  (24)  is  also  realized  using  one  ROM. 

The  discussion  in  this  section  should  give  some  feel  for  the  way  in  which 
the  machine  was  designed.  Those  parts  which  may  seem  a little  vague  should  be- 
come much  clearer  in  the  more  thorough  discussion  of  machine  structure  presented 
in  the  next  sections. 

IV.  OVERALL  MICROPROCESSOR  STRUCTURE 

The  discussion  from  now  on  is  concerned  exclusively  with  the  final  machine 
design.  This  is  the  end  product  of  many  iterations  of  the  type  described  in 
the  previous  sections. 

A block  schematic  of  the  machine  is  shown  in  Fig.  4.  Three-level  pipe- 
lining is  used.  The  program  counter,  together  with  its  controls,  program  and 
address  code  ROMS  make  up  the  first  time  zone.  The  second  contains  program  de- 
coding, an  arithmetic  section  devoted  entirely  to  RAM  memory  addressing,  hard- 
ware necessary  to  implement  program  operations  such  as  the  DO  LOOP,  GO  TO  etc., 
and  ROMS  containing  constants  like  the  of  (1),  exp(x),  etc.  and  the  Hadamand 
transformation  ROM  designated  HAD  ROM.  The  third  zone  contains  arithmetic  pro- 
cessing. Each  zone  is  separated  from  the  others  by  a layer  of  clocked  buffers. 
Buffer  transfers  occur  at  a clock  rate  CLL  equal  to  1/T^.  Thus,  the  hardware 
in  each  zone  is  on  its  own  to  complete  all  the  operations  required  of  it  during 
one  period  of  duration  T^.  The  final  machine  clock  was  chosen  to  be  8 MHz  or 
less  (this  remark  should  become  clearer  after  reading  Section  5.14).  Thus  T^ 
is  not  less  than  125  nsec.  The  remaining  blocks  in  Fig.  4 are  the  input/output 
sections  for  voice  sample  and  communications  link  data  and  the  machine  control 
logic.  They  are  outside  the  pipelining  structure  since  they  have  no  functions 
which  must  be  completed  during  one  T^.  As  such  they  are  really  not  a part  of 
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the  basic  microprocessor. 

Basically,  pipelining  implies  that  the  arithmetic  section  executes  the  com- 
mands processed  by  the  logic  section  in  the  prior  which  appeared  in  the  pro- 
gram 2T^*s  ago.  Schematically,  this  can  be  represented  as  follows: 


Number  of 
Intervals 

TABLE  4.1 

Time  Zone  1 Time  Zone  2 

Time  Zone  3 

1 

PR 

Step 

N 

— 

— 

2 

PR 

Step 

N+1 

Logic  N 

— 

3 

PR 

Step 

N+2 

Logic  N+1 

Arithmetic  N 

4 

PR 

Step 

N+3 

Logic  N+2 

Arithmetic  N+1 

5 

PR 

Step 

N+4 

Logic  N+3 

Arithmetic  N+2 

**Logic  N'*  denotes  that  part  of  the  command  in  program  step  N which  is  executable 
in  the  logic  section,  while  ’’arithmetic  N"  denotes  the  part  which  requires  the 
arithmetic  section  to  perform  some  task.  Not  every  programming  step  requires 
action  from  both  logic  and  arithmetic  parts.  For  example,  the  DO  LOOP  command 
has  an  effect  only  on  the  logic  section,  while  some  add  instructions  will  in- 
volve the  arithmetic  section  alone.  Generally,  though,  most  programming  steps 
will  require  some  action  in  both  sections.  It  is  of  course  necessary  at  times 
to  store  in  temporary  memory  (RAMs)  results  of  arithmetic  operations.  Since 
two  time  zones  are  involved  in  this,  care  must  be  taken  to  give  the  appropriate 
commands  at  the  proper  times.  This  type  of  operation  is  indicated  in  Fig.  4 by 
the  lines  emanating  from  the  arithmetic  processing  unit  into  time  zone  2 to  RAM 
memory  and  logic  organization  units.  Similarly  a DO  LOOP  requires  control  of 
the  program  stepping  and  an  indication  of  program  position.  This  is  indicated 
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in  Fig.  4 by  the  lines  crossing  from  time  zone  2 into  1 and  vice  versa.  These 
and  other  similar  cases  will  be  discussed  in  detail  in  Section  5. 

Table  4.1  should  not  be  construed  as  an  example  of  how  the  program  is  writ- 
ten. As  a matter  of  fact  each  line  of  program  crosses  3 time  zones.  Thus,  a 
typical  line  would  be: 

PR  STEP  N;  LOGIC  N;  ARITHMETIC  N. 

If  an  arithmetic  operation  result  in  line  N say  is  to  be  processed  in  the 
logic  section,  the  associated  logic  command  will  then  appear  in  a line  below, 
i.e.,  N+1.  In  this  way  the  program  step  and  line  number  becomes  the  same  thing. 
Section  6 will  give  more  detail  on  programming  procedures. 

V.  DETAILED  MICROPROCESSOR  STRUCTURE 

Structural  details  of  the  machine  are  given  in  Figs.  5,  6,  7,  and  8.  The 
discussion  will  proceed  in  terms  of  functions  performed  and  resultant  hardware 
realizations.  In  this  way  all  interrelated  blocks  appear  in  the  same  descrip- 
tion. Some  units  which  perform  more  than  a single  function  will  appear  in  sev- 
eral places;  however,  their  description  will  be  from  a different  point  of  view. 

5.1  Program  Counter  and  ROMs 

The  ROMs  used  to  store  the  program  and  addressing  information  are  256  x 4 
field-programmable  devices  with  a typical  access  time  of  40  nsec  and  a guaran- 
teed maximum  of  60  nsec.  Since  T = 125  nsec,  it  was  considered  prudent  to  in- 

c 

elude  in  the  first  time  zone  only  the  program  counter  and  the  program  and  ad- 
dress code  ROMs.  The  256  counts  needed  are  applied  by  two  4-bit  counters. 
Addresses  are  supplied  via  line  drivers  (not  shown)  to  the  8 program  ROMs,  the 
6 address  code  ROMs,  the  Page  ROM  and  the  Shift  ROM.  The  counter  output  is 
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Fig. 5.  Vocoder  microprocessor  program  and  program  decode  sections. 
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Fig.  6.  Vocoder  microprocessor  arithmetic  unit. 
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Vocoder  microprocessor  interface  units 
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Fig,  8.  Preemphasis  network  and  gain  stage  (f  = 350  MHz). 


also  used  as  the  B input  in  comparator  1 for  DO  LOOP  purposes  (see  Section 
5.6).  The  counter  clock  CLL  (See  Setion  5.14)  is  the  basic  system  clock,  nom- 
inally 8 MHz.  The  all-full  indicator  on  the  counter  designated  TCC  is  used  in 
pitch  decoding  (see  Section  5.13).  Presetting  of  the  counter  is  under  control 
of  the  DO  LOOP,  IF  and  GO  TO  instructions  (see  Sections  5.6  and  5.7).  The  PE 
(parallel  enable)  inputs  are  forced  low  at  the  appropriate  point  and  an  exter- 
nal input,  supplied  by  a multiplexer  is  set  into  the  counter  after  a low-high 
transition  of  the  next  CLL.  The  counter  multiplexer  consists  of  4 dual  4 input 
units.  Only  3 Inputs  are  used. 


Select 

^0 

Inputs 

0 

Not  used.  Unconnected 

1 

8 bits  from  ADR  ROM  B BUFFER 

0 

8 bits  from  ADR  ROM  C BUFFER 

1 

8 bits  from  DF  LATCH 

0 

0 

1 

1 


The  select  lines  are  under  control  of  the  IF  and  GO  TO  commands  (see  Sec- 
tion 5.7). 

The  outputs  of  all  ROMs  described  here  go  into  clocked  buffers  (clocked  by 
CLL)  which  separate  the  first  from  the  second  time  zones.  In  addition  program 
ROMs  2 line  4 in  conjunction  with  program  bit  7 of  Address  ROM  C provide  via  a 
Nand  gate  and  clocked  buffer  the  Sq  input  to  Mux  Y,  Details  are  given  in  the 
next  three  sections.  Also  program  ROM  8 bits  2,  3,  and  5 provide  an  indication 
in  the  first  time  zone  of  the  IF  command.  The  reason  for  this  is  that  COMP  is 
executable  in  the  arithmetic  section  while  is  performed  a time  zone  earlier 

in  the  logic  section.  Therefore  the  COMP  - IF  operation  (see  Section  5.6  for 
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greater  details)  is  coded  in  two  consecutive  program  lines.  On  the  other  hand 
information  whether  it  is  to  be  just  a COMP  or  a COMP-IF  instruction  is  needed 
at  the  same  COMP  appears  hence  the  jumping  at  a time  zone  to  provide  IF 
indication . 

5.2  RAM  Memory  and  Buffers 

In  Fig.  5 two  sets  of  RAMs  referred  to  as  RAM  A and  RAM  B are  shown.  Each 
consists  of  two  256  x 16  arrays  arranged  as  page  0 and  page  1 giving  a total  of 
1024  16-bit  words  of  random-access  memory.  The  RAMs  are  fed  by  multiplexers. 
One  of  the  inputs  into  these  comes  from  the  arithmetic  section  Write  Mux  output 
This  path  enables  the  results  of  arithmetic  processing  to  be  stored  in  RAM 
memory.  The  other  path  comes  from  the  other  RAM  array.  It  is  possible  there- 
fore to  shift  data  from  RAM  A into  RAM  B or  vice  versa  without  affecting  any 
other  part  of  the  machine.  The  two-page  controls  for  both  RAMs  are  also  on  the 
Page  ROM.  An  output  multiplexer  is  also  provided  for  each  set  of  RAMs  enabling 
either  set  to  write  into  the  AB  or  BB  buffers.  These  buffers  are  individually 
clocked  by  commands  Clock  AB  and  Clock  BB,  respectively.  Thus,  they  provide 
the  dual  functions  of  individual  storage  and  time  zone  2 to  3 isolation.  Two 
further  multiplexer  arrays  in  front  of  the  AB  and  BB  buffers  are  also  used. 

They  are  labeled  AB  Mux  and  BB  Mux,  respectively.  Under  program  control,  the 
Write  multiplexer  output  in  the  arithmetic  section  ca  be  channeled  through. 

This  facility  enables  the  AB  and  BB  buffers  to  be  used  as  temporary  storage 
for  arithmetic  operations.  This,  as  will  be  later  shown,  enhances  the  versatil 
ity  of  the  COMP  and  COMP-IF  instructions. 
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5.3  Address  Processing 

Address  ROMs  A and  B in  the  first  time  zone  and  their  clocked  buffers  in 
the  second,  provide  conventional  addressing  for  the  two  RAM  arrays.  The  buffered 
8-bit  address  is  channeled  through  a multiplexer  (ALA  Mux  and  ALB  Mux)  and  through 
an  ALU  (ADR  A and  ADR  B ALU).  The  ALUs  are  conventionally  in  A plus  B mode. 

The  other  input  comes  from  MUX  Y which  in  turn  has  the  value  of  the  DO  LOOP 
counter  I,  on  its  output  (unforced  state  of  = 1 on  MUX  Y) . Outside  a 

DO  LOOP  I “ 0,  thus  the  address  reaching  the  RAMs  will  be  the  8-bit  word  pro- 
vided by  the  address  ROMs.  This  arrangement  also  satisfies  all  addressing  needs 
inside  a conventional  DO  LOOP.  The  first  address  of  a consecutively  numbered 
array  is  provided  by  the  addressing  ROMs.  The  I counter  then  increments  once 
every  run  through  a LOOP,  providing  a unit  increment  for  the  accessed  array 
address  via  MUX  Y and  the  ADR  ALU.  If  unconventional  incrementing  or  random 
addressing  is  needed  in  a LOOP,  the  other  MUX  Y inputs  can  be  used.  As  an 
example,  accumulator  B (ACCB)  may  be  incremented  by  some  fixed  integer  k stored 
in  Accumulator  A by  the  command 

ACCB  = ACCB  + ACCA 

once  during  DO  LOOP  execution.  The  contents  of  ACCB  can  then  be  channeled  via 
MUX  Y to  the  appropriate  address  ALU.  Another  possibility  is  to  use  the  AB 
buffer.  This  provides  the  most  versatile  addressing  but  is  restrictive  in  the 
sense  that  one  RAM  array  becomes  unavailable.  But  for  example,  a set  of 
random  numbers  computed  by  the  program  and  stored  in  say  RAM  A in  consecutive 
locations,  may  be  used  in  a subsequent  DO  LOOP  to  address  RAM  B via  the  AB  buf- 
fer and  MUX  Y.  Also  the  address  ALU  must  be  in  shift  A mode  and  the  ALA  MUX 
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in  state  1,  i.e. , it  channels  the  contents  of  CBI  to  RAM  A.  In  this  way  the 
RAM  A address  can  be  incremented  consecutively,  whilst  RAM  B is  under  AB  buffer 
addressing  control.  CBI  is  a counter  which  may  be  incremented  by  unity  on  com- 
mand CBI^;  alternatively  a word  appearing  on  the  output  of  the  A ROMs  can  be 
written  into  it.  Unit  incrementing  of  CBI  was  already  described  in  conjunction 
with  use  of  the  AB  buffer  for  addressing.  The  presetability  of  CBI  provides 
another  way  of  random  addressing,  this  time  of  both  RAMs  (if  required);  however, 
the  random  sequence  has  to  be  programmed  into  the  A ROM.  The  address  C ALU  has 
an  unforced  state  of  A plus  B.  Thus  a base  address  provided  by  ADR  ROM  C can 
be  incremented  in  a DO  LOOP  via  MUX  Y.  Thse  addresses  are  supplied  to  the  A^ 

ROM  which  in  turn  contains  a preprogrammed  random  sequence  of  addresses  in  a 
consecutive  array.  These  are  then  clocked  into  CBI  and  are  available  to  RAM  A 
and  B,  via  inputs  1 on  the  ALA  and  ALB  multiplexers  and  ADR  ALUs  (usually  the 
latter  will  be  in  Shift  A mode). 

Facilities  are  also  provided  for  nested  looping.  Usually  in  such  cases 
the  innermost  loop  would  be  a conventional  DO  LOOP  with  incremental  addressing 
provided  by  the  I counter  as  described  above.  For  the  outer  loops,  however 
1=0  and  unit  incrementing  has  to  come  from  somewhere  else.  One  of  the  func- 
tions of  the  counter  is  to  provide  this  facility.  may  be  set  to  some 

value  specified  by  ADR  ROM  C and  then  incremented  by  unity  at  every  pass  through 
the  loop.  In  this  configuration  MUX  Y is  in  the  0 1 state.  For  more  than  two 
loop  nestings,  incrementing  has  to  come  from  either  ACCB  or  the  AB  buffer.  In 
principle,  since  the  AB  buffer  can  be  accessed  from  either  RAM  array  and  from 
the  arithmetic  sections,  unconstrained  addressing  and  incrementing  for  any  degree 
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of  loop  nesting  is  possible.  However,  software  has  to  be  used  to  compute  the 
addressing  and  their  possible  increments.  I,  and  to  a less  dedicated  de- 
gree ACCB  can  provide  hardware  oriented  nested  DO  LOOPING  up  to  3 deep.  The 
counter  can  also  be  used  as  intermediate  address  storage.  The  flexibility  of 
the  system  is  best  illustrated  by  the  fact  that  up  to  4 levels  of  indirect  ad- 
dressing are  possible.  Thus,  an  address  out  of  ADR  ROM  C can  be  modified  by 
the  ADR  C ALU  by  adding  some  increment  to  it.  This  in  turn  produces  an  output 
from  the  ROM  which  can  be  modified  a fourth  time  by  the  ADR  A or  B ALUs. 

5.4  Decoders 

The  need  for  high-speed  program  throughput  means  that  indivdual  instruc- 
tions must  be  very  powerful  and  flexible.  As  a consequence,  most  commands  have 
to  be  independent  of  all  others.  Unfortunately,  this  in  turn  generates  a need 
for  wide  program  words  and  an  attendant  increase  in  ROMs  and  allied  hardware. 
Some  effort  was  therefore  put  into  collecting  non- interfering  commands  into 
groups  (of  not  more  than  7).  Each  group  is  then  accessed  by  a one  of  8 decoder. 
Only  one  command  of  each  group  can  be  used  at  a time;  however,  a 7 to  3 inde- 
pendent line  compression  has  been  achieved.  The  eighth  output  of  each  decoder 
is  not  used.  In  this  way  if  none  of  the  particular  group  of  commands  is  needed 
the  eighth  output  is  automatically  accessed  ensuring  non-interference  with  the 

rest  of  the  program.  Of  the  32  lines  of  program  code  the  bottom  6 are  used  as 
inputs  into  two  decoders  (labeled  DECl  and  DEC2) . This  produces  an  additional 
set  of  14  commands.  An  additional  two  decoders  (DEC3  and  DEC4)  are  driven  by 
the  bottom  6 lines  of  ADR  ROM  C.  Many  of  the  envisioned  operations  required  of 
the  machine  do  not  need  ADR  ROM  C.  With  the  exception  of  a multiply,  all  arith- 
metic operations  are  an  example  of  this.  As  a consequence  the  demands  imposed 
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on  the  ADR  ROM  C by  the  operations  described  to  date  will  not  be  heavy,  leaving 
it  available  for  other  things.  Capitalizing  on  this  fact,  the  two  additional  de- 
coders ensure  fuller  use  of  ADR  ROM  C.  The  two  groups  of  commands  available 
from  DEC3  and  DEC4  are  those  least  frequently  used.  A full  listing  is  given 
in  Table  5.1. 

The  3 COMP  instructions,  already  described  in  Section  3,  certainly  are 
mutually  exclusive.  MULT  used  to  initialize  a multiply,  MOD  to  produce  a modulus 
and  DOIB  for  ACCB  control,  are  all  executable  in  the  arithmetic  section  and  do 
pose  a certain  constraint  on  programming  flexibility.  CLIX  is  used  to  update 
by  unity.  It  is  a second  time  zone  command  and  not  a very  frequent  pnp.  . 

The  command  ZERO  is  used  to  disable  the  receive  decoder  multiplexer  2 


TABLE  5.1 


STATE 

DEC  1 

DEC  ? 

DEC  1 

DEC  4 

0 0 0 

OPEN 

OPEN 

OPEN 

OPEN 

0 0 1 

COMP  > 

ZERO 

EXP 

xs. 

0 10 

COMP  = 

DO 

CLED 

oil 

COMP  < 

CHA 

Not  Used 

-0 

10  0 

MULT 

DOIAR 

rr 

ALOG 

10  1 

MOD 

IF 

tt 

HAD 

110 

CLIX 

INK 

CMCL 

AB  MUX 

111 

DOIB 

SET  I 

X 

Not  Used 

BB  MUX 

(see  Fig.  7).  The  output  of  the  multiplexer  goes  to  channel  7 of  the  arithmetic 
ALU  multiplexer  A.  Selection  of  this  channel  together  with  the  ZERO  command 
makes  0 available  for  arithmetic  processing.  DO  is  the  DO  LOOP  command  to  be 
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described  in  greater  detail  in  Section  5.6.  CHA  ensures  that  the  ADR  ROM  C buf- 
fer output  is  shifted  directly  through  the  ADR  C ALU.  DOIAR  shifts  the  arith- 
metic ALU  commands  through  directly,  or  inverts  them  depending  on  whether  the 
least  significant  bit  of  the  MUX  Y output  is  0 or  1.  IF  (see  also  Section  3.7) 
initializes  a conditional  jump,  INH  is  used  in  conjunction  with  DOIB  if  the  ef- 
fect of  the  least  significant  bit  of  MUX  Y on  the  DIOB  mode  of  operation  is  to 
be  inhibited.  This  permits  ACCB  operations  to  come  through  as  specified  or  in- 
verted depending  on  whether  the  word  contained  in  ACCA  is  positive  or  negative, 
respectively  (for  ACCB  operations,  see  Section  5.8).  Set  I writes  the  8-bit 

X 

word  appearing  on  the  output  of  ADR  C ALU  into  the  counter.  The  EXT  command 
disables  ROM  , enables  ROM  and  makes  the  exponent  data  available  (see  also 
Section  5.5).  CLED  clocks  the  edit  shift  register  (described  in  Section  2.2). 

CMCL  is  used  once  in  the  program.  It  therefore  constitutes  a clock  indicating 
completion  of  a program  run.  It  is  used  for  pitch  computation  and  timing  (see 
also  Section  5.14).  XS2,  XS^,  and  XSq  all  mutually  exclusive,  are  used  to 
select  px(n  + 72),  px(n  + 1)  or  sx(n  + 1)  (see  Section  5.12)  and  Decoded  Pitch, 
respectively  from  Receive  Decoder  MUX  2.  ALOG  is  the  antilog  command,  it,  like 
EXP,  disables  ROM  A^  and  makes  available  antilog  data  (see  Section  5.5).  HAD 
is  the  Hadamard  matrix  transformation  command.  It  channels  a multiplexer  to 
the  HAD  ROM  input  (see  Section  5.11).  The  8 inputs  into  which  are  divided  into 
two  four-bit  words,  the  ’i’  word,  which  is  updated  once  every  run  through  the  pro- 
gram and  the  ’j’  word,  which  is  updated  every  time  HAD  appears.  A DO  LOOP  con- 
taining 16  HAD  instructions  therefore  varies  j between  0 and  15.  This  is  done 
for  every  value  of  i (0  to  15) • The  output  of  the  HAD  ROM  is  the  appropriate  add 
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or  subtract  command  specified  in  the  Hadamard  transformation  (discussed  in 
Sections  2.4  and  3).  The  multiplexer  channels  the  HAD  ROM  output  through  to  the 
arithmetic  ALU.  AB  MUX  and  BB  MUX  are  the  controls  on  the  two  multiplexers  in 
front  of  AB  and  BB  buffers,  respectively.  The  natural  unaccessed  state  of  the 
decoder  output  is  a 1.  The  RAM  outputs  therefore  are  channeled  through  these 
multiplexers.  A 110  on  DEC  4 inputs  forces  a 0 into  line  6,  i.e.,  AB  MUX  and  this 
selects  the  Write  MUX  output  through  the  AB  multiplexer.  The  same  happens  with 
the  BB  multiplexer  when  111  is  applied  on  DEC  4. 

Decoders  1 and  2 are  permanently  enabled.  This  cannot  be  permitted  for  the 
3rd  and  4th  decoders  since  otherwise  conventional  use  of  ADR  ROM  C might  be 
translated  into  an  unwanted  command  out  of  these  devices.  An  enable  line  desig- 
nated ENABLE  C is  provided.  Every  command  in  decoders  3 and  4 must  be  accompanied 
by  an  ENABLE  C.  The  8th  line  out  of  the  program  clocked  buffers  in  Fig.  5 is 
shown  as  ENABLE  C.  The  bar  indicates  that  this  line  is  nominally  high  and  be- 
comes 0 only  when  accessed  , this  being  the  inverse  of  conventional  usage.  A 
zero  on  the  decoders  enables  them. 

5.5  ALOG  and  Exp  Routines 

The  computation  of  logarithms  is  done  by  a software  subroutine.  Sixteen- 
bit  input  numbers  are  mapped  into  all  7-bit  words  (0  to  127)  in  a one-to-one 
transformation  such  that  adjacent  input  words  are  offset  from  each  other  by  a 
constant  on  a logarithmic  scale.  The  inverse  transformation,  referred  to  as 
the  antilog  (hence,  the  ALOG  label)  is  programmed  into  ROM  Thus  for  input 

words  m = 0 through  127,  128  16-bit  output  words  are  available.  A 60-dB  dynamic 
range  was  assumed  to  be  very  adequate.  This  led  to  the  choice  of  0.5  dB  steps 
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and  a consequent  dynamic  range  of  64  dB.  Thus,  generally: 


_ , f ALOG  (m  + 1) 1 

^°8l0  I ALOG  (m)  °' 


(29) 


giving 


ALOG  (m)  = ALOG  (m  + 1)  x 10 


40 


(30) 


The  largest  number  representation,  assigned  to  ALOG  (128),  is  1 - 2 . It  was 

assumed  that  for  the  case  at  hand  this  was  sufficiently  close  to  unity  to  define: 

_ 

40 

ALOG  (127)  =10  = 0.9440608763 


as  a result  the  general  expression  for  ALOG  (m)  is  given  by 

/128  - m^ 
r \ 40  / 


ALOG  (m)  = 10 


(31) 


This  expression  was  used  to  evaluate  the  ALOG  transformation  table. 

Let  us  assume  that  X is  a number  whose  logarithm  is  required.  The  procedure 
is  based  on  the  following  algorithm: 


IF  X - ALOG 


^ m 

> 0 m = m + 

= 0 m = m 

< 0 m = m “ *2 


(32) 


The  evaluation,  irrespective  of  X,  is  always  started  with  m = 64.  It  takes  exactly 
7 iterations  at  which  point  the  latest  value  of  m is  the  required  LOG  (X) . 

The  ALOG  command,  besides  disabling  ROM  and  enabling  ROM  A^  also  puts  a 

zero  into  the  most  significant  bit  on  the  input  of  ROM  This  ensures  that  only 

bits  0 to  127  are  available  as  ROM  address  inputs  when  the  log^ taking  routine  is 

in  use.  When  ALOG  is  not  activated,  the  MSB  is  held  at  a high  with  only  addresses 
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128  to  255  available  as  inputs.  This  part  is  reserved  for  the  EXP  (m)  mapping. 

EXP  out  of  decoder  3 enables  ROM  and  makes  available  at  the  ROM  output  the 

following  transformation,  given  an  input  address  m: 

. g-n2 

EXP  (m)  = e . (33) 

Thus  for  128  £ m < 255,  0 < EXP  (m)  £ 0.994557 

This  table  is  used  exclusively  for  the  exponential  rundown  procedure  during 
real  time  pitch  computation,  as  described  in  Section  2.1. 

5.6  DO  LOOP 

Dedicated  DO  LOOP  operation  has  already  been  suggested  in  the  introduction 
as  a means  of  speeding  up  of  the  program  throughput.  The  command  is  specified  by 

DO,  DF,  DL,  N (34) 

where  DF  is  the  first  line  of  the  loop,  DL  the  last  and  N the  number  of  times  the 
loop  is  to  be  executed.  Usually  DF  is  the  line  immediately  following  the  DO 
instruction,  however  this  need  not  be  so.  DL  is  stored  in  ADR  ROM  B.  The  out- 
put of  the  DL  latch  goes  to  one  of  the  two  inputs  on  a comparator  the  other 
being  the  current  program  counter  setting.  The  DO  therefore  is  a first-time 
zone  operation.  Comparator  1 gives  an  output  the  moment  the  counter  setting 
reaches  the  value  DL.  This  output,  denoted  E^^  in  Fig.  5,  performs  two  tasks. 

It  clocks  the  I counter  incrementing  I by  unity  and,  if  the  output  of  the  com- 
parator 2 (i.e.,  E^)  is  low,  it  parallel  enables  the  program  counter  ensuring  that 
the  next  value  the  counter  will  assume  is  that  available  from  the  counter  multi- 
plexer. The  unforced  state  of  the  multiplexer  is  1 1 which  channels  the  content 
of  DF.  So,  as  long  as  there  is  a parallel  enable  during  program  step  DL,  the 
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next  setting  is  step  DF.  This  is  essentially  the  looping  mechanism.  Compara- 
tor 2 has  the  current  value  of  I as  one  of  its  inputs  and  the  output  of  the  N 
latch  as  the  other.  When  equality  of  I and  N is  reached,  the  DO  LOOP  has  been 
executed  the  required  N times.  goes  high  inhibiting  the  parallel  enable 

action  of  the  counter  therefore  continues  sequentially  which  has  the  effect 
of  exiting  from  the  DO  LOOP.  Two-level  indirect  addressing  permits  N to  be  a 
variable.  For  example,  an  array  of  N values  Nj  say  is  stored  in  consecutive 
address  locations  (j  ~ l,....n).  Let  us  assume  the  DO  is  nested  in 

an  outer  one  controlled  by  I . If  for  I =0  ADR  ROM  C supplies  A (N  ) , N 
will  be  used  for  the  inner  DO.  The  next  pass  through  the  outer  DO  increments 
by  unity  giving  the  new  address  for  ROM  as  A^(N^)  4-  1 = A^(N^)  as  a re- 


o o 


suit  this  time  the  inner  DO  is  executed  times.  Continuing  in  this  way  access 
is  made  to  all  N^,  making  it  possible  to  individually  control  the  number  of  ex- 
ecutions of  the  innermost  DO  LOOP. 

5.7  IF,  COMP,  COMP-IF  and  GO  TO  Instructions 

The  IF  instructions  produces  a conditional  jump.  The  decision  may  be  up 
to  3-way  depending  on  whether  two  numbers  A and  B say  satisfy  the  A > B,  A = B 
or  A < B conditions.  The  indicators  used  are  the  carry-out  line  from  the  ALU, 
C^,  the  A = B output  and  the  most  significant  bits  of  the  A and  B words,  MA  and 
MB,  respectively.  Table  5.2  lists  all  possible  states.  This  table  was  used  to 
implement  a hard-wired  IF  instruction. 

Justification  for  the  COMP  instruction  and  a short  description  of  its  oper- 
ation was  already  provided  in  Section  III  and  Eqs . (26)  and  (27).  The  two  quan- 
tities to  be  compared  are  the  contents  of  the  AB  and  BB  buffers,  respectively. 
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TABLE  5.2 


State 

MA 

MA 

ALU 

C 

o 

(Q.) 

A = B 

RESULT 

0 

0 

0 

0 

A positive;  B positive 

1 

0 

1 

0 

0 

A positive;  B negative 

' A > B 

1 

1 

1 

0 

A negative;  B negative 

0 

0 

1 

1 

A positive;  B positive  i 

1 

A = B 

2 

1 

1 

1 

1 

A negative;  B negative  j 

1 

0 

0 

1 

0 

A positive;  A positive 

3 

1 

0 

0 

0 

A negative;  B positive 

A < B 

1 

1 

0 

0 

A negative;  B negative 

They  will  be  referred  to  as  AB  and  BB  in  the  subsequent  discussion.  The  com- 
parator used  in  this  operation  is  shown  in  the  top  left  corner  of  Fig.  6.  Since 
AB  and  BB  may  be  positive  or  negative  independently,  the  comparator  must  be 
able  to  handle  2’s  complement  numbers.  The  outputs  from  the  comparator  are  com- 
bined with  the  COMP  lines  to  give 

= COMP  > • (A  > B)  + COMP  = • (A  = B)  + COMP  < • (A  < B)  . (35) 

The  COMP  lines  are  active  low  whereas  the  comparator  outputs  active  high.  There- 
fore, a comparison  if  met  makes  E^.  = 1,  if  not  met  E^.  = 0.  E^.  controls  the  sel- 

N N N 

ect  lines  of  3 other  multiplexers.  During  a COMP  operation  the  ACCB  control 
lines  are  not  used  for  Accumulator  B control.  This  accumulator  becomes  unavail- 
able for  other  than  shift  through  operations.  Similarly,  the  PHR  multiplexer 
selects  lines  PHR  MUX  and  write  multiplexer  control  WR  MUX  are  no  longer 
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available  for  their  prime  duties.  The  two  multiplexers  are  connected  in  the  0 
channel  mode,  i.e.,  select  AR  ALU  output.  The  3 control  lines  ACCB^,  ACCB^, 
and  ACCB2  are  one  of  the  two  sets  of  3 inputs  (channel  1)  into  a multiplexer 
selected  by  E^.  The  zero  channel  is  Sq  MUXA,  MUXA  and  S2  MUXA.  The  outputs 
of  this  multiplexer  drive  the  three  select  lines  of  MUXA.  Thus,  when  a compar- 
ison is  not  made  or  made  but  not  met  the  MUXA  outputs  are  selected  by  the  con- 
trol lines  originally  intended  for  the  job,  i.e.,  Sq,  and  $2  MUXA.  If,  how- 
ever, a COMP  instruction  is  met  = 1 forces  ACCBq,  ACCB^  and  ACCB2  to  take 
over  control  of  MUXA.  Thie  mechanism  permits  selection  of  different  MUXA  inputs 
as  the  A word  of  the  ALU  depending  on  whether  a comparison  is  met  or  not  met. 

The  word  to  be  selected  if  the  comparison  is  met  is  specified  by  ACCBq,  ACCB^ 
and  ACCB2  if  it  is  not  met  Sq  MUXA,  S^  MUXA,  and  S2  MUXA  does  the  selection. 
Similarly,  the  lines  controlling  the  arithmetic  operation  are  either  the  orig- 
inally intended  ones  ARq  and  AR^  if  no  comparison  is  made  or  one  made  and  not 
met  or  their  place  is  taken  by  PHR  MUX  and  WR  MUX,  respectively,  when  E^^  = 1. 
This  channeling  is  effected  via  two  further  multiplexers  shown  in  Fig.  6.  Since 
the  two  sets  of  controls  are  totally  separated,  completely  independent  arith- 
metic operations  may  be  performed  on  two  ALU  inputs  depending  on  the  outcome 
of  a comparison  between  AB  and  BB.  The  following  are  three  examples  selected 
to  illustrate  the  use  of  the  COMP  instructions.  The  notation  will  be  described 
in  greater  detail  in  Section  6. 
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1. 


COMP>  (-|  AB,  2AB)  (SFTA,  SFTA) 

2.  COMP  = (ACCA,  MUXY)  (SFTA,  +)  CB  ACCA 

3.  COMP<  (BB,  AB)  (-,  SFTB)  PHR  ACCB 

1.  This  means;  if  AB  > BB  shift  y AB  through  to  the  natural  output  of 

the  arithmetic  section  (which  is  the  WR  MUX  output) • If  the  comparison  is  not 
met  shift  through  2AB. 

2.  If  AB  = BB  clock  the  content  of  ACCA  back  into  itself.  This  is 

equivalent  to  saying  that  ACCA  should  not  be  disturbed.  If  the  comparison  is 

not  met  the  content  of  MUXY  incremented  by  the  content  of  buffer  CB  and  the  re- 
sults clocked  into  ACCA. 

3.  For  AB  < BB  clock  BB  - PHR  into  ACCB.  If  AB  >;  BB  place  only  the  PHR 
into  ACCB.  The  second  entry  in  the  first  bracket,  i.e. , AB  is  in  this  case 
just  a dummy  since  the  ALU  selects  only  channel  B anyway.  Any  one  of  the  MUXA 
input  could  be  used  here,  however,  AB  with  a command  of  0 0 0 is  the  most  con- 
venient. Although  internal  ACCB  operations  are  not  possible  during  a COMP, 
clocking  into  ACCB  does  not  require  ACCB^,  ACCB^,  or  ACCB^  and  is  therefore  per- 
missible. 

The  COMP  instruction  may  also  be  used  as  the  decision  stage  for  a conditional 
jump.  If  in  the  line  following  a COMP  instruction  an  IF  appears,  the  program 
counter  will  go  to  a location  specified  in  ADR  ROMC  or  continue  in  its  natural 
sequence  depending  on  whether  the  COMP  was  or  was  not  met,  respectively.  The 
operations  specified  by  the  COMP  itself  have  no  bearing  on  the  conditional  jump. 
Table  5.3  combines  all  the  relevant  states  and  their  consequences  for  both 
kinds  of  conditional  jumps. 
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TABLE  5.3 


No. 

i 

1 

1 " 

* 

CPS 

(A  = B or  E^) 
"^1 

1 

Qo 

1 

i 

Meaning 

Action 

1 

1 0 

1 

0 

0 

AB  > BB  ' 

conventional 

No  P.E. 

2 

0 

1 

0 

1 

AB  < BB 

IF 

P.E.  to  ADR  ROMB 

3 

0 

1 

1 

X 

> 

11 

P.E.  to  ADR  ROMC 

4 

0 

0 

1 

X 

Met 

COMP 

P.E.  to  ADR  ROMC 

5 

0 

0 

0 

X 

Not  met  , 

IF  j 

No  P.E. 

6 

1 

X 

X 

X 

■ No  IF 

I 

1 

1 

No  P.E. 

P.E.  stands  for  parallel  enable.  X means  a don't  care  state.  The  action 
to  be  taken  in  line  2 for  example  means : set  the  program  counter  to  go  to  the 

steps  specified  in  the  ADR  ROM  B on  the  next  rising  clock  edge.  The  parallel 
enable  on  the  program  counter  is  active  low,  therefore  in  conjunction  with 
Table  5.3. 

P.E.  = IF  + + CPS  ■ Qq  • • (36) 

The  GO  TO  command  is  a straightforward  unconditional  jump.  It  comes  from 

the  top  line  of  the  page  ROM  and  goes  to  the  P.E.  input  on  the  program  counter. 

The  address  to  which  the  program  is  to  go  is  in  ADR  ROMC.  The  complete  P.E.  on 

A A 

the  program  counter  taking  Eq.  (36),  GO  TO  and  EOD  into  account  is 

P.E.  = (IF  + CPS  • + CPS  • • Q^)  • GO  TO  • EOD  . (37) 


* 

CPS  = COMP  > • COMP  = • COMP  <. 

’k'k 

EOD  = END  of  DO  output.  This  goes  high  at  the  end  of  a DO  LOOP. 
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The  relevant  data  controlling  the  select  lines  on  the  program  counter  multi- 


plexers is  collected  in  Table  5.4. 


TABLE  5.4 


IF 

GOTO 

s 

0 

Select 

Meaning 

. . - - - j 

0 

0 

X 

X 

X 

X 

X 

II.  . .......  ....  . — 1 

Does  not  occur 

0 

1 

0 

1 

0 

1 

ADR  ROM  B 

AB  < BB  in  conventional  IF 

0 

1 

1 

X 

1 

0 

ADR  ROM  C 

AB  > BB  in  conventional  IF 

1 

0 

X 

X 

1 

0 

ADR  ROM  C 

Unconditional  jump 

1 

1 

X 

X 

1 

1 

DF  LATCH 

Static  condition;  used  in  DO 

From  the  above 


= (IF  + Q^)  • GOTO 
= IF  + • GOTO 


(38) 


5.8  Arithmetic  ALU  and  Accumulator  B 

The  inputs  into  the  arithmetic  ALU  are  fed  by  two  8-input  multiplexers  re- 
ferred to  as  MUXA  and  MUXB  (see  Fig.  6).  Selection  of  outputs  is  conventinally 
handled  by  six  dedicated  lines;  MUXA,  MUXA  and  MUXA  for  multiplexer  A 

and  Sq  MUXB,  MUXB  and  S2  MUXB  for  the  other.  An  exception  to  this  rule  for 
multiplexer  A alone  occurs  only  during  the  COMP  instruction  described  in  the 
previous  section.  Table  5.5  lists  all  available  inputs  and  their  locations. 
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TABLE  5.5 


s 

o 

^2 

MUX  A 

MUX  B 

0 

0 

0 

AB 

BB 

0 

0 

1 

y AB 

BB 

0 

1 

0 

2 AB 

2 BB 

0 

1 

1 

ACCA 

16- 

1 

0 

0 

^HF 

CB 

1 

1 

0 

MUX  Y 

ACCA 

1 

1 

1 

EXT  or  ZERO 

, PHR  or  EDIT 

The  first  3 lines  in  each  multiplexer  are  the  AB  and  BB  buffer  outputs 
direct,  scaled  by  y and  2.  This  is  the  fastest  way  of  achieving  scaling.  MUX  B 
also  provides  a scaling.  This  is  used  for  quick  implementations  of  Eq.  (20) 
and  also  in  non-real  pitch  evaluation  of  the  window  function  Eq.  (13).  ACCA  is 
available  in  both  MUXES.  The  reason  for  this  is  that  the  ALU  can  only  provide  A 
minus  B.  In  this  way  ACCA  minus  X or  X minus  ACCA  are  both  possible.  For  the 
same  reason  BB  appears  in  MUX  A as  well.  2 ACCA  was  found  very  useful  for  a 
number  of  operations.  MUX  Y is  multiplexer  Y output.  This  permits  constants, 
chiefly  used  for  addressing  to  be  brought  into  the  arithmetic  section.  Since 
MUX  Y is  in  the  second  time  zone,  its  output  has  to  be  buffered  as  shown  in 
Fig.  5.  CB  permits  constants  from  the  AROM’s  to  be  available  in  the  arithmetic 
section.  The  last  setting  in  the  MUX  A column  contains  both  EXT  and  ZERO  since 
this  address  makes  the  output  of  receive  decoder  MUX  2 available.  Whether  what 
comes  in  on  lines  A is  one  of  the  4 inputs  into  the  multiplexer  labeled  EXT  or 
ZERO  depends  on  further  commands  out  of  decoder  4 for  EXT  and  decoder  2 for  zero. 
Position  7 in  MUXB  is  taken  up  by  the  output  of  the  product  hold  register  labeled 
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PHR,  also  the  two  least  significant  lines  are  connected  through  a two  input 
multiplexer  to  X*5  X5  (described  in  Section  2.2  when  the  editing  procedure  was 

explained)  and  the  two  least  significant  bits  of  the  PHR  output.  Edit,  which 
happens  only  once  during  the  program,  is  controlled  by  an  unused  setting  of  the 
ACCB  controls,  i.e.,  ACCB^  = 0,  ACCB^  = 1,  ACCB2  = !•  Under  this  control  (with 
111  on  MUXB  select)  the  product  hold  register  is  cleared  to  zero  and  the  two 
lines  X’^  X^  are  channeled  through. 

The  arithmetic  ALU  is  required  to  perform  only  4 different  operations: 
shift  A,  A minus  B,  A plus  B and  shift  B.  Two  dedicated  control  lines  AR^  and 
AR^  are  used.  The  control  on  the  ALU  packages  themselves  (the  74S181*s)  requires 
6 lines.  This  and  other  relevant  data  is  summarized  in  Table  5.6. 


TABLE  5.6 


AR 

0 

AR^ 

1 

1 

f 

1 - - 

ALU  Controls 

Enable 

Meaning 

M 

C 

n 

^3 

^2 

s 

0 

MUXA 

MUXB 

0 

0 

SHIFT  A 

0 

0 

0 

1 

1 

0 

0 

1 

0 

1 

A - B 

0 

0 

0 

1 

1 

0 ' 

' 0 

0 

1 

0 

A + B 

0 

1 

1 

0 

0 

1 

0 

0 

1 

1 

SHIFT  B 

0 

1 

1 

0 

0 

1 

1 

1 

ALU  controls  for  only  A - B and  A + B are  used.  For  SHIFT  A the  B is  set 
to  zero.  Similarly,  when  just  B is  required,  the  A + B command  is  used  and  A 
is  forced  to  zero  by  disabling  MUXA.  It  will  be  seen  that  the  two  sets  of  con- 
trols are  duals  of  each  other  for  5 of  the  6 controls  and  the  first  one,  M is 
zero  throughout.  A very  straightforward  and  consequently  low  delay  realization  is 
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possible.  The  system  is  given  in  Eq.  (39). 


M = 0 

C = S.  = S = AR  (39) 

n 3 o o 

S.  = S-  = AR 
2 1 o 

and  the  MUX  enable  lines 


ENABLE  MUXA  = AR  • AR, 
o 1 


(40) 


ENABLE  MUXB  = AR  • AR^  . 

o 1 

As  already  mentioned  during  the  description  of  the  COMP  instruction, 

PHR  MUX  and  WR  MUX  commands  replace  AR^  and  AR^^,  respectively,  when  a comparison 
is  met.  The  enable  lines  on  MUXA  and  B and  also  the  ALU  controls  are  therefore 
available  through  multiplexers  controlled  by  E^.  Besides  the  above  and  the  oper 
ations  performed  in  conjunction  with  the  IF,  one  further  is  associated  with  the 
ALU.  This  is  DOIAR  which  stands  for  direct  or  inverse  operation  in  arithmetic 
ALU.  It  will  be  noticed  from  Table  5.6  that  the  commands  of  SHIFT  A and  SHIFT  B 
are  logic  duals  of  each  other.  Similarly  so  are  those  of  A - B and  A + B.  This 
choice  is  deliberate  and  is  utilized  to  provide  a single  line  conditional  com- 
mand. The  condition  used  is  the  value  of  the  least  significant  bit  of  multi- 
plexer Y (designated  LSBY) . The  function  DOIAR  • LSBY  controls  a pair  of  ex- 
clusive OR’s,  the  other  inputs  into  which  are  AR  and  AR^ . When  DOIAR  is  not 

o 1 

used  its  value  is  high  and  consequently,  irrespective  of  LSBY,  the  exclusive 
0R*s  transmit  AR^  and  AR^  unchanged.  During  a DOIAR  command  the  outputs  of  the 
exclusive  0R*s  are  AR^  and  AR^  if  LSBY  = 0,  AR^  and  AR^  if  LSBY  = 1.  The 
examples  below  illustrate  the  usage  of  the  command: 
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1. 

2. 


DOIAR  (ACCA,  CB)  ACCB 

DOIAR  (AB  - PHR)  ACCB 


1.  Leaves  value  of  ACCA  unchanged  if  LSBY  = 0 or  replaces  it  by  CB  if 
LSBY  = 1. 

2.  Clock  AB  - PHR  into  ACCB  if  LSBY  = 0,  and  AB  + PHR  if  LSBY  = 1. 

It  should  also  be  pointed  out  that  the  LSBY  may  assume  difference  values. 
Thus  inside  a DO  LOOP  LSBY  will  alternate  between  0 and  1 on  consecutive  passes. 
If  the  ACCB  or  AB  inputs  on  MUXY  are  used  the  evenness  or  oddness  of  the  numbers 
contained  there  will  be  the  controlling  factor. 

The  arithmetic  ALU  output  is  channeled  through  a specially  arranged  set  of 
two  input  multiplexers.  Their  purpose  is  to  provide  saturation.  Operation  of 
this  kind  implies  non-modulo  arithmetic-  When  a positive  number  is  to  appear 
on  the  ALU  output  exceeding  the  largest  representation  permissible,  the  output 
is  forced  to  this  largest  permissible  value  rather  than  "wrap  around"  as  would 
be  the  case  in  ordinary  modulo  arithmetic.  Under  similar  conditions  for  nega- 
tive numbers  the  output  is  clamped  to  the  most  negative  permissible  value.  In 
the  representation  used  here,  this  implies  1 - 2 and  -1  as  the  two  limiting 
values-  The  choice  of  this  kind  of  computational  strategy  is  dictated  by  sta- 
bility considerations.  It  may  be  shown  that  in  any  kind  of  computation  which 
contains  feed-back  (examples  of  this  are  all  the  digital  filters  used  here) , a 
self-sustaining  instability  is  likely  if  overflow  and  conventional  "wrap  around" 
is  to  occur.  Furthermore,  this  will  be  so  even  if  all  else  is  totally  ideal. 
Saturation  arithmetic  on  the  other  hand  guarantees  complete  stability  even  if 
accidential  overflow  was  to  occur.  Table  5.7  summarizes  and  comments  on  all 
possible  states.  SA,  SB  and  SF  are  the  sign  bits  of  the  A,  B,  inputs  and  the 
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ALU  output.  OP  is  the  operation  with  OP  = 0 for  Add  and  OP  = 1 for  Subtract. 
Actually  OP  is  equivalent  to  AR”^.  The  last  column,  labeled  OUTPUT,  indicates 
whether  the  result  of  the  operation  is  correct  (in  which  case  an  O.K.  appears) 
or  in  the  event  of  an  overflow,  what  kind  of  clamping  is  needed.  The  State 
column  is  the  decimal  equivalent  of  OP  SA  SB.  The  clamping  multiplexer  follow- 
ing the  ALU  has  one  of  its  inputs  (channel  0)  connected  to  the  ALU  output.  The 
other  has  the  inverted  sign  bit,  i.e.,  SF  connected  to  the  most  significant  bit 
and  SF  itself  to  all  others. 

TABLE  5.7 


! State 

OP 

SA 

SB 

SF 

Comments  j 

Output 

0 

0 

0 

0 

0 

1 

A + B,  both  positive,  answer  positive 

O.K. 

0 

0 

0 

0 

1 

A + B,  both  positive,  answer  negative 

positive 

clamp 

1 

1 

1 

0 

0 

1 

X 

Effective  subtract  of  2 positive 
numbers 

O.K. 

2 

0 

1 

0 

X 

Effective  subtract  of  2 positive 
numbers 

O.K. 

3 

0 

1 

1 

1 

A + B,  both  negative,  answer  negative 

O.K. 

3 

0 

1 

1 

0 

A + B,  both  negative,  answer  positive 

negative 

clamp 

4 

1 

0 

0 

X 

A - B,  both  positive 

O.K. 

5 

1 

0 

1 

0 

Effective  add  of  2 positive  numbers 
answer  positive 

O.K. 

5 

1 

0 

1 

1 

Effective  add  of  2 positive  numbers 
answer  negative 

positive 

clamp 

6 

1 

1 

0 

0 

Effective  add  of  2 negative  numbers 
answer  positive 

negative 

clamp 

6 

1 

1 

0 

1 

Effective  add  of  2 negative  numbers 
answer  negative 

O.K. 

1 7 

1 

1 

1 

X 

A - B,  both  negative 

O.K. 

ACCB  consists  of  an  ALU  also  with  saturation  clamping.  Although  it  is 
describes  as  the  ACCB,  the  content  of  the  register  is  usually  implied  when 
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a number  is  referred  to  as  ACCB.  The  unit  is  intended  primarily  for  arithmetic 
operations  and  is  therefore  designed  for  16-bit  words.  In  addressing,  only 
half  the  number  is  needed  and  so  only  the  top  8 bits  of  ACCB  are  channeled  over 
to  MUXY.  Three  dedicated  lines  designated  ACCB^,  ACCB^  and  ACCB^  control  opera- 
tions. These  are  listed  in  Table  5.8. 

TABLE  5 . 8 


No. 

ACCB 

o 

1 

1 

i 

ACCB  i 
1 1 

ACCB  2 

Internal ’ 
MUX  1 

F ! 

1 

PHR  MUX 
! F 

Meaning 

Comments 

0 

1 ^ 

0 

1 

0 

1 

1 ' 

' 0 

SHIFT  EX 

ALU  in  A4-B  with  A=0 

1 

0 

1 

0 

1 

0 i 

i 0 

ACCB  4-  EX 

Straight  A4-B 

2 

0 

1 

0 , 

1 

, X 1 

1 

! ^ 

! Not  used 

3 

0 

1 

^ X 

1 

X 

Edit 

! 

4 

1 

0 

0 

0 

1 

ACCB 

ALU  in  A-B  with  B=0 

5 

1 

0 

1 

0 

1 

ACCB  4-  1 

ALU  in  A4-B,  8th  Bit=l,  B=0 

6 

1 

1 

0 

0 

0 

ACCB  - EX 

Straight  A-B 

L_^ 

1 1 
1 

1 

1 

0 

SHIFT  - EX 

I 

ALU  in  A-B  with  A=0 

EX  = external  input;  E^  = MUX  enable  line. 

Operation  5,  above,  i.e.,  ACCB  + 1 has  a 1 in  the  8th  bit  down.  In  this 
way  the  ACCB  can  be  incremented  for  addressing  purposes.  This  is  achieved  by 
disabling  the  PHR  MUX  and  forcing  a 1 into  the  8th  carry  bit.  AO  detects  this 
state  since 

AO  = (ACCBq  + ACCB^  4-  ACCB^)  * CPS  . (41) 

It  is  fed  through  a Nand  gate  to  give  the  required  carry. 
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The  MOD  and  DOIB  commands  are  also  associated  with  ACCB. 


5.9  Array  Multiplier 

Figure  9 gives  a very  rough  schematic  of  the  system  used.  The  boxes  with 
numbers  in  them  are  AM25S05  4 by  2,2’s  complement  multipliers.  A block  labeled 
i - j refers  to  multiplier  bits  i and  i + 1 and  multiplicant  bits  j through 
j + 3.  Neither  the  multiplier  nor  multiplicand  lines  are  shown,  but  their  pre- 
sence and  position  can  be  inferred  from  the  labeling.  Since  only  the  16  most 
significant  bits  are  used  in  the  product  not  all  multiplier  blocks  are  needed, 
thereby  saving  on  hardware.  More  detailed  information  on  such  arrays  is  given 
in  the  manufacturers  data  sheets.  The  particular  arrangement  used  here  produces 
a 16-bit  product  in  typically  80  nsec. 

The  multiplier  in  the  arithmetic  section  is  supplied  by  the  MULT  Register. 
This  in  turn  is  clocked  in  every  cycle  by  CLL  and  therefore  always  contains  the 
output  of  the  arithmetic  ALU.  The  multiplicand  is  the  CB  buffer  output.  This 
buffer  is  clocked  only  on  the  dedicated  command  CLCB.  Since  the  multiplier  is 
memoryless  a new  product  of  the  current  ALU  output  word  and  CB  is  constantly 
being  produced.  The  multiply  command  MULT  is  used  only  as  a clock  on  the 
product  hold  register  PHR.  Thus  MULT  results  in  the  current  product  being  re- 
tained for  further  use.  This  value  is  available  until  the  next  MULT  command 
appears . 

5.10  Intermediate  Memory 

Besides  the  RAMs , which  could  also  be  calssified  as  intermediate  memory, 
a number  of  single  word  intermediate  memories  are  available.  The  division  into 
time  zones,  necessitated  by  the  need  for  pipelining,  provides  storage  in  the 
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16-BIT  PRODUCT 


Fig.  9.  16  X 16  multiplier  array  with  16-bit  product. 
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dividing  buffers.  Not  everywhere,  however,  is  use  made  of  this  facility.  Thus, 
all  buffers  clocked  by  CLL  directly  will  store  their  contents  for  one  clock 
period  only.  Others  like  AB,  BB  and  CB  buffers  perform  the  dual  role  of  (1.) 
isolation  of  2nd  and  3rd  time  zones  and,  (2.)  intermediate  storage.  This  second 
effect  is  achieved  by  providing  dedicated  clocking  lines  for  each  of  them.  In 
this  way  only  when  the  clocking  command  is  given  will  the  content  on  the  inputs 
of  these  buffers  be  stored  in  them.  Storage  of  given  words  for  an  arbitrary 
number  of  computational  periods  is  therefore  possible.  CBI  in  the  second  time 
zone  (Fig.  5)  is  also  a storage  register.  Also,  an  existing  word  may  be  incre- 
mented by  unity.  Exactly  the  same  is  true  of  the  counter.  Both  of  these 
registers  are  8-bits  wide.  The  multiply  register,  clocked  by  CLL  does  not  pro- 
vide intermediate  storage,  the  PHR  however  does.  This  was  already  described  in 
the  previous  section.  ACCA  can  provide  storage  since  a clock  ACCA  (CL  ACCA)  is 
available,  and  finally  ACCB  with  CL  ACCB  provided  is  also  an  intermediate  store. 

5.11  ROM  Coding 

At  a number  of  points  of  this  report  mention  has  been  made  of  the  ROMs  used 
to  implement  memoryless  Boolean  functions.  This  section  lists  and  describes 
them  all . 

The  ENCODER  ROM  implements  Eqs . (24).  The  inputs  are  A^,  A^,  A^ , A^,  A^, 

^11  ^21  with  the  least  significant  and  up  to  the  seventh  address 

bit.  The  8th  and  most  significant  bit  is  grounded.  The  four  outputs  are  labeled 

0 , 0^,  0^  and  0,.  The  encoder  ROM  translates  the  computed  output  words  (stored 
12  3 4 

in  the  output  buffer) into  the  required  format  used  for  serial  data  transmission. 
The  ROM  output  feeds  the  Data  Register  (Fig.  7.). 
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HAD  ROM  produces  the  expression  given  in  Eq.  (28)  as  one  of  its  outputs 


this  is  denoted  by  Another  line,  denoted  is  used  to  give  the  complement, 

i.e.,  The  eight  inputs  are  the  i and  j words.  The  two  remaining  outputs 

are  used  to  produce 


(42) 


^3  ^2  ' ^1  ' ^0  ^2  ^^0  ^1^ 


S.  will  be  seen  to  be  zero  for  lines  1 and  3 in  Table  2.4.  is  zero  for 

A B 

lines  1 and  2.  They  are  therefore  numerically  equal  to  and  respectively. 

Actually 

S,,  = S • TS 
11  A 


S 


21 


• TS 


(43) 


where  TS  = 0 during  a spectrum  computation  and  1 during  pitch.  More  details  on 
this  parameter  are  given  in  Section  5.14- 

The  TIMING  ROM  is  driven  by  the  ’i’  word  and  TS.  It  is  used  to  produce,  in 
conjunction  with  other  circuitry  described  in  Section  5.14,  timing  pulses  for 
data  in  an  output.  Table  5.9  lists  the  requirements.  The  R^  are  the  ROM  out- 
puts and  the  last  column  represents  clocking  pulses.  More  about  these  will  be 
found  in  Section  5.14. 
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TABLE  5.9 


TS 

r ■ " ■■ 

i 

R3  R2  Rq 

' '"1 

No.  of  GP^  1 

1 ^ 

0 

1 

0 

0 0 10 

1 

5 

1 

0 

1,  2,  3,  4 

0 0 11 

4 

i 

1 

0 

: 5,  6,  7 

0 10  1 

0 

8 , . . . . , 15 

0 110 

1 

1 

1 

X 

0 0 0 0 

7 

The  EDIT  ROM  has  already  been  mentioned  in  conjunction  with  editing  pro- 
cedure in  Section  2.2.  Specifically,  the  EDIT  ROM  accepts  the  8-^decision  bits 

generated  during  the  editing  process  as  a parallel  address  and  provides  one  line 
only  labeled  X^.  The  logic  implemented  is  given  in  Eq.  (17), 

Finally,  there  are  the  two  sets  of  A ROMS  designated  ROM  storing  con- 
stants and  ROM  A^  used  for  ALOG  and  EXP  commands  described  in  Section  5.5. 

5.12  Data  1/0,  Acquisition  and  Synchronization 

The  data  input/output  facility  at  the  machine  end  is  provided  by  the  Data 
Register  (Fig.  7).  The  encoded  word  for  transmission  is  clocked  into  it  in 
parallel.  The  MSB  of  this  word  appears  always  on  line  OP^  of  the  register  out- 
put. Clocking  pulse  CP^  provides  the  required  clocking  edge  for  this  register, 
when  clock  output  buffer  (GLOB)  is  activated. 

GLOB  = S^  MUX  A + AR^  + . (44) 

An  edge  triggered  F/F  controlled  by  CP^  and  GLOB  in  turn  controls  the  select 
line  S^  on  the  data  register.  With  Sq  = S^  = 1 the  register  (a  set  of  74S194s) 
is  in  parallel  load  mode.  GLOB  will  put  S^^  = 1 and  CP^,  which  comes  always  in 
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the  following  line,  puts  back  to  zero.  Thus  only  immediately  after  CLOB  will 
a parallel  load  result.  Elsewhere  the  select  lines  will  be  in  state  = 1, 

= 0 which  implies  a shift  right  every  time  a clocking  edge  is  applied.  The 
clock  on  the  register  may  be  either  CP^  or  . The  latter  is  used  in  bursts  of 
appropriate  length  (see  Section  5.14)  to  shift  right  the  available  data.  Thus 
the  first  edge  of  CP  puts  the  most  significant  bit  available  on  CP.  into  CP-,. 

It  also  pulls  into  CP^  the  bit  applied  to  the  DSR  input  on  the  Data  Register- 
For  the  first  spectrum  word  CP^  will  have  five  edges.  After  such  a clocking  se- 
quence, the  five  bits  originally  present  will  have  been  clocked  out  via  OP^  and 
five  new  bits  now  occupy  postions  OP^  to  OP^ . Another  clocking  sequence  CP^ 
whose  edges  occur  half  a period  behind  CP^  is  used  to  clock  the  content  of  OP^ 
serially  into  a first-in  first-out  memory.  Thus,  on  completion  of  this  procedure 
the  original  content  of  the  data  register  has  been  transferred  into  the  FIFO, 
designated  output  FIFO.  The  number  of  clocking  edges  in  the  sequence  CP^  (and 
as  a consequence  in  CP^  as  well)  is  given  in  the  last  column  of  Table  5.9. 

These  sequences  correspond  to  the  number  of  bits  in  the  encoded  final  output 
words  as  described  in  Section  2.4.  The  received  data,  which  appears  on  the  bottom 
lines  of  the  Data  Register  after  completion  of  a CP^  - CP^  clocking  sequence  is 
treated  as  a word  equivalent  to  the  one  just  clocked  into  the  output  FIFO.  Thus 
if  CPj^  has  five  edges  the  receive  word  is  assumed  to  represent  the  first  spec- 
trum word  and  so  on.  These  received  words  are  then  decoded  as  described  in 
Section  2.4  by  a hard-wired  multiplexer  designated  Receive  Decoder  MUXl  (Fig.  7). 
The  output  of  this  multiplexer  is  the  10  input  on  Receive  Decoder  MUX  2. 

In  order  to  ensure  that  the  received  bits  do  indeed  have  the  interpreted 
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meaning,  it  is  necessary  to  synchronize  the  received  data  stream  with  the  inter- 
nal CP  clocks.  Since  the  input  data  is  at  2.4  KHz  whereas  the  CP  pulses  are 
at  a vastly  different  rate,  again  a FIFO  is  used  as  a buffer.  This  is  desig- 
nated Receive  FIFO  in  Fig.  7.  The  input  into  this  device  is  clocked  at  the 
2.4  KHz  rate,  externally  supplied  as  the  input  data  clock.  The  system  first 
requires  Frame  synchronization,  i.e.,  it  recognizes  spectrum  and  pitch  words 
and  lines  them  up  in  a specially  provided  shift  register.  It  then  waits  until 
the  machine  itself  is  ready  to  accept  data.  When  this  happens  (referred  to  as 
data  sync)  data  is  taken  out  of  this  shift  register  and  processed  while  new 
data  is  entered  into  the  input  FIFO  ready  for  transfer  into  the  shift  register. 

5.13  Pitch  Decoding 

A one-shot  triggered  at  the  appropriate  time  clocks  the  received  pitch 
word  into  a RAM,  capable  of  storing  16  8-bit  words.  The  address  is  generated 
by  a 4-bit  counter  and  channeled  via  a MUX.  Only  during  the  write  cycle  is  the 
address  supplied  exclusively  by  the  counter.  When  in  read,  an  offset  is  added 
to  the  address.  Since  A is  manually  adjustable  this  permits  an  effective  delay 
to  be  introduced  between  the  just- written  and  juat-read  pitch  words.  This  delay  ad- 
justment is  used  to  equalize  any  time  offsets  that  may  exist  between  the  re- 
ceived pitch  and  the  spectrum.  A buffer  stores  the  currently  read  pitch  word. 

The  buffer  output  is  compared  against  the  Hiss  word.  Its  complement  is  also 
used  to  preset  a counter,  whenever  that  becomes  full,  and  is  clocked  by  TCC, 
the  program  counter  overflow.  A gating  arrangement  ensures  that  for  non-hiss 
words,  the  pitch  multiplexer  (Fig.  7)  selects  the  impulse  1 0 0 Og  and  is  other- 
wise connected  to  zero.  In  this  way  an  impulse  separated  in  time  by  the  multiple 
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of  computational  periods  specified  by  the  received  pitch  word  is  generated. 

The  output  of  the  pitch  multiplexer  is  available  to  the  machine  via  the  EXT  in- 
put on  MUXA  and  the  01  input  on  the  Receive  Decoder  MUX2.  If  a Hiss  word  is 
received  the  pitch  multiplexer  is  connected  to  the  0 0 or  1 1 states  under 
pseudorandom  number  control.  The  0 0 position  supplies  a positive  unit  pulse, 
whilst  the  1 1 position  gives  a negative  unit  pulse.  As  a consequence,  pitch 

excitation  during  Hiss  is  a noise  signal  of  unit  amplitude  but  random  sign.  The 

1 8 

pseudorandom  bit  generator  is  capable  of  providing  a random  sequence  with  2 

members.  Since  the  sampling  clock  averages  140  psec  (see  next  Section)  the  se- 

18 

quence  repetition  rate  is  2 x 140  psec  = 36.7  sec.  This  is  sufficiently  long 
not  to  generate  a noticeable  repetitive  pattern. 

5.14  Timing 

In  order  to  maximize  program  throughput  speed  a half  frame’s  work  of  com- 
puting is  done  without  any  pauses.  Since  program  execution  times  will  vary  de- 
pending on  input  data,  the  intervals  between  TCC  pulses  will  also  vary,  making 
the  internal  machine  clock  non-uniform.  On  the  other  hand,  the  input  speech 
samples  are  received  at  a uniform  140  psec  sample  rate.  Therefore,  the  effec- 
tive internal  sample  rate  has  to  average  140  psec  or  less  over  each  half  frame. 

A variation  of  130  psec  to  150  psec  can  be  expected.  The  internal  clocking  rate 
CLL  is  adjusted  until  over  a half  frame  the  average  execution  time  is  just  under 
140  psec.  Generally,  there  will  be  an  irrational  number  of  speech  samples  in 
a half  frame.  Since  only  integer  values  are  acceptable,  an  interger  N„_  is  gen- 

erated  each  1/2  frame  such  that  the  various  values  averaged  over  many  frames 

nr 

approximate  the  above  irrational  number.  In  this  way  the  frame  rate  and  sample 
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rate  may  be  synchronized  to  each  other,  preventing  relative  slippage.  In  the 

machine  itself,  using  software,  is  generated  by  unit  incrementing  once  every 

program  run.  At  the  end  of  the  program  a comparison  is  made.  As  long  as 

is  less  than  N„_  no  action  is  taken.  When  = N„_,  the  command  CMCL  is  gen- 
nr  MK  nr 

erated  which  stops  the  system  clock  CLL.  The  system  at  that  time  has  used  up 

N„  samples  and  so  it  has  to  wait  until  a new  set  of  N samples  has  been  stored 
nr  nr 

in  the  sample  FIFOs. 

5.15  Voice  Analog  Section 

A schematic  of  this  section  is  shown  in  Fig.  8.  The  voice  output  from  a 
microphone  is  fed  into  an  amplifier  with  6 to  21  dB  of  variable  gain.  The  out- 
put feeds  a presample  low-pass  cutting  off  at  3.3  KHz.  This  is  an  8th  order 
elliptic  filter  (C0815c,  0 = 71°)  with  zero  gain  at  DC.  The  output  is  split 
into  two  paths.  One  goes  via  a preemphasis  network  again  with  zero  DC  gain  into 
a sample  and  hold.  The  other  is  band-pass  filtered  by  an  8th  order  Butterworth 
80  to  600  Hz  filter.  Its  output  is  also  sampled  by  a sample  and  hold  module. 

The  first  of  the  two  is  the  spectrum,  the  second  the  pitch  path.  Preemphasis 
of  spectrum  samples  has  been  found  empirically  to  improve  final  speech  quality. 
Fundamental  pitch  periods  lie  in  the  range  of  80  to  200  Hzj  the  band-pass  filter 
therefore  permits  up  to  the  third  harmonic  of  the  highest  fundamental  to  come 
through.  Spectrum  and  pitch  samples  are  transferred  into  an  A/D  converter  during 
alternating  periods.  Two  clocks,  denoted  by  SHS  and  SHP,  respectively  for  spec- 
trum and  pitch  sampling,  are  generated  as  follows:  A presetable  8-bit  wide  counter 

has  its  parallel  input  hard  wired  to  220.  A 1-MHz  clock  is  used  giving  a 35  ysec 
period  from  preset  to  all  full  (255).  The  255  state  characterized  by  the  over- 
flow TC  = 1 is  used  as  a parallel  enable  for  the  counter  and  is  a 1 ysec  wide 
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pulse  with  a 35  ysec  period. 

The  acquisition  time  of  the  sample  and  hold  modules,  defined  as  the  time 
an  unchanging  input  must  be  maintained  to  get  the  specified  accuracy,  is  35  ysec. 
Both  SHS  and  SHP  are  maintained  high  for  exactly  that  length  of  time.  The 
140  ysec  sample  period  is  divided  into  four  35  ysec  zones.  Starting  with  both 
SHS  and  SHP  low  for  35  ysec,  SHS  goes  high.  This  lasts  for  35  ysec  whilst  SHP 
is  still  low,  in  the  third  zone  both  are  low  again  and  in  the  fourth  SHP  alone 
is  high.  A high  is  the  sample  state  and  during  a low  the  sample  is  held.  The 
A/D  conversion  time  (to  12-bit  accuracy)  is  30  ysec.  The  strobe  input  into 
the  A/D  on  transition  from  0 to  1 resets  the  converter  to  zero  and  sets  the  *'busy 
bit”  (^Iso  referred  to  as  the  status)  to  1.  When  the  strobe  goes  low,  conversion 
begins.  In  our  system,  starting  with  the  spectrum  channel  as  the  strobe  goes 
high  the  A/D  is  reset  and  the  status  goes  high  disabling  both  the  FIFOs  and  the 
px(n  + 72)  buffer.  One  microsecond  later,  the  strobe  goes  low,  and  SHS  goes 
high  initiating  a spectrum  sample.  The  FET  switch  (active  low)  channels  the 
pitch  S&H  to  the  D/A.  This  will  have  already  been  converted  in  the  pitch  S&H 
into  a steady  level.  Thus  the  A/D  starts  its  conversion  cycle  on  the  pitch 
sample.  Some  30  ysec  later  the  conversion  is  complete  at  which  time  the  status 
goes  low.  Since  SRCL  goes  high,  this  transition  clocks  the  A/D  output  into  the 
px(n  + 72)  buffer  as  well  as  into  the  FIFOs.  In  the  meantime  the  spectrum  sample 
period  (35  ysec)  is  being  completed.  By  the  time  the  next  strobe  comes  along  the 
spectrum  sample  is  already  held  for  some  35  ysec;  also,  the  FET  switch  channels 
the  spectrum  sample  into  the  A/D  and  the  spectrum  conversion  in  the  A/D  begins. 

At  the  same  time  the  next  pitch  sample  is  already  being  taken  into  the  pitch 
S&H.  On  completion  of  the  spectrum  conversion  the  status  output  on  the  A/D 
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clocks  this  into  the  FIFOs  but  not  into  the  px(n  + 72)  buffer  since  SRCL  is  now 
toggled  into  the  low  state  and  is  not  producing  a clocking  edge.  The  cycle  then 
repeats.  The  FIFOs  contain  both  spectrum  and  pitch  samples  (alternating)  whilst 
the  px(n  -h  72)  buffer  holds  pitch  samples  only. 

At  the  output  side,  the  reconstructed  samples  are  fed  first  into  a FIFO. 

This  is  done  in  order  to  bring  their  rate  back  to  the  constant  140  Visec.  The 
clock  used  to  store  the  samples  in  this  FIFO  is  the  internally  generated  pro- 
gram execution  rate  which  varies  (as  discussed  in  the  previous  section)  from 
120  ysec  approximately  to  150  ysec.  The  average  rate  over  one-half  a frame  is 
140  ysec.  Thus,  extracting  the  samples  from  the  FIFO  at  exactly  140  ysec  elim- 
inates effectively  the  internal  varying  rate.  These  samples  are  then  put  through 
a low-pass  filter  identical  to  the  3.3  KHz  presample  filter.  De-emphasis  may 
be  added  at  this  point  to  compensate  for  the  input  pre-emphasis.  However,  the 
quality  of  speech  was  found  to  be  more  natural  without  it  and  so  it  was  left 
out  in  our  system. 

A power  amplifier  feeding  earphones  or  a loudspeaker  completes  the  audio 
system. 

VI . PROGRAMMING' 

At  the  end  of  Section  4 the  format  in  which  a program  line  would  be  writ- 
ten was  already  indicated.  The  program  step  is  just  a consecutive  number  and 
conveys  no  information  other  than  that  of  position.  The  real  information  con- 
tent is  in  the  logic  and  the  arithmetic  parts.  The  two  are  separated  by  a semi- 
colon to  denote  that  execution  of  the  former  is  one  time  zone  ahead  of  the  lat- 
ter. The  following  is  a list  of  mnemonics  used  to  denote  various  operations 
together  with  comments  and  explanations.  The  simpler  operations,  such  as  the 
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read  or  write  are  strictly  one-zone  operations  (logic  in  this  case)  however, 
more  complex  commands  like  the  COMP  - IF  for  example  require  both  logic  and 
arithmetic  sections.  The  listing  therefore  is  in  three  parts:  logic,  arithme 

tic  and  combined  operations. 

6.1  Logic  Commands 


1.  X -v  Y 

Read  constant  at  address  X in  AROM  and 
write  into  Y where  Y can  be  either  the 
CB  or  the  CBI  buffer. 

2.  jX  -V  Y 

For  J = A;  read  X in  RAM  A and  store  in 
Y.  Y may  be  the  AB,  BB,  buffer  or  a lo- 
cation in  RAM  B.  For  J = B it  is  a read 
from  RAM  B and  into  AB,  BB  or  a location 
in  RAM  A. 

3.  WR  RA  at  X 

Write  content  of  WR  MUX  output  into  RAM  A 
at  location  X. 

4.  WR  RB  at  X Write  content  of  WR  MUX  output  into  RAM  B 

at  location  X. 

5.  WR  .X  RB  at  Write  content  of  RAM  A at  X into  RAM  B 

A B 

at  Y.  The  dual  with  A and  B reversed 
is  also  permissible. 

6.  DO  DF,  DL,  N Execute  program  lines  DF  up  to  and  in- 


eluding  DL,  N times. 

7.  CHA 

Channel  the  A input  through  the  Address 
C ALU. 

8.  CHI 

X 

Channel  to  the  output  of  MUX  Y. 

9.  GO  TO  N 

Unconditional  jump  to  program  step  N. 

10.  EXP,  X CB 

Write  content  of  the  1^2  ROM,  exponent 
section  address  X into  CB. 
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11.  ALOG  (X)  CB 

Write  content  of  ROM,  ALOG  section, 

address  X into  CB. 

12.  CLIX 

Unit  increment  I . 

X 

13.  CBI  = CBI  + 1 

Unit  increment  content  of  CBI. 

14.  Set  I = N 

X 

Set  I to  be  equal  to  N.  Where  N can 

X 

be  an  arbitrary  integer  (including  zero). 

15.  CHACCB 

Channel  ACCB  (8  top  bits  only)  through 
MUX  Y. 

16.  CHACCB,  CHI 

X 

Channel  content  (8  top  bits  only)  of 
AB  through  MUX  Y. 

17.  INH 

Inhibit  the  effect  of  the  least  signifi- 
cant bit  out  of  MUX  Y on  the  DOIB  command. 

18 . CMCL 

Half  frame  clocking  computed  in  software. 

19.  HAD 

Arithmetic  operations  under  Hadamard 
matrix  control. 

20.  BL 

Blank;  no  operation. 

6.2  Arithmetic  Commands 


1 . ZERO 

Shift  zero  through  MUX  A. 

2 . X,  ACCA 

Clock  X into  ACCA. 

3.  X ACCB 

Clock  X into  ACCB. 

4.  X + Y 

Add  X to  Y. 

5.  X - Y 

Subtract  Y from  X. 

6.  MULT 

Multiply;  the  content  of  CB  will  be  multi- 
plied by  the  content  of  the  multiply  regis 
ter  which  is  the  past  ALU  output  and  the 
product  will  be  clocked  into  the  PHR  at 
the  end  of  the  CLL  clock  period. 
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7.  MOD  (X)  The  modulus  of  X appears  on  theoutput  of 

ACCB.  X is  the  number  coming  through  the 
arithmetic  ALU. 

8.  DOIAR  (X  OP  Y)  Direct  or  inverse  operation  in  arithmetic 

ALU.  Execute  Y OP  Y if  the  least  signifi- 
cant bit  out  of  MUX  Y (LSBY)  is  zero  or 

X OP  Y if  it  is  one.  Two  pairs  of  OP  and 

OP  are  plus,  minus  and  Shift  A,  Shift  B. 

9.  DOIB  (X  OP  Y)  Direct  or  inverse  operation  in  ACCB.  Same 

as  above  but  the  operations  are  with  re- 
spect to  ACCB.  These  are  given  in  Table 
5.8.  In  addition  the  control  is  LSBY  ^ 
MSB  ACCA.  If  this  is  1 direct  operation 
results  if  0 inverse. 

10.  C0MP>  (A,B),  (OP^,  OP^)  X Y 

If  AB  > BB  do  A OP^  X and  write  into  Y 

if  AB  ^ BB  do  B OP^  X and  wriete  into  Y. 

C0MP<  and  COMP  = are  analogous . For 
greater  detail  see  Section  5.7. 

6.3  Joint  Commands 


There  are  two  such  commands.  The  IF  jump  and  the  COMP  - IF.  Both  require 
an  arithmetic  operation  followed  by  a logic  command  in  the  next  line.  The  IF 


appears  as  follows: 


If  > 0 a,  = 0 b,  < 0 -V  c; 

This  means  that  depending  on  the  outcome  of  the  comparison  between  A and  B, 
where  A is  the  word  in  the  A channel  of  the  arithmetic  ALU  and  B is  the  B chan- 
nel word,  go  to  address  *a*  if  A > B,  go  to  ’b*  if  A = B and  to  address  *c*  if 


81 


A < B.  This  is  a three-way  decision. 

For  the  COMP  IF,  using  the  COMP=  as  an  example,  a typical  program  line 
might  look  as  follows: 

; C0MP=  (AB,  ACCA)  (+,  SFTA)  CB  ACCA 

If  YES  N; 

This  means  that  if  the  operation  is  not  met  (irrespective  of  what  the  operations 
are)  go  to  program  step  N,  otherwise  continue  sequentially. 

VII.  CONCLUSIONS 

One  of  the  consequences  of  the  design  approach  described  in  this  report  is 
the  constant  need  to  re-assess  the  effect  of  the  most  recent  modification  on 
the  rest  of  the  machine  and  then  take  appropriate  action.  However,  when  the 
final  modification  is  reached  and  the  system  just  works  successfully,  the  in- 
centive to  go  back  and  rework  the  system  for  a more  elegant  solution  is  lacking. 
If,  therefore,  the  machine  described  here  seems  in  places  capable  of  obvious 
improvements  and  none  are  made,  this  is  so  mainly  due  to  the  lack  of  time  for 
a more  elegant  solution. 

The  final  machine  built  has  the  following  statistics: 

Power  consumtion:  22  amps  at  5 volts 

^ 0.3  amps  at  15  volts - 

Size:  451  DIPs  85%  Shottky  TTL  rest  regular  TTL 

3-1/2  Augat  boards  of  digital  hardware 
1/2  Augat  board  of  analog  hardware 
Fits  into  one  standard  drawer. 
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